by KadeG
Last Updated June 01, 2017 21:19 PM

I'm working on a statistical stream (time-series) compression algorithm which optimizes its performance as it processes data. It would benefit from stats to help it determine the sample size it should use at any moment. I've found a few formulas (e.g. Slovin's), but I'm not sure what would work best.

Need: formula for sample size. Given-

- Population size (this changes based on settings)
- Desired confidence level, etc.

We'd also like the algorithm to weight its current performance at any moment by its confidence in the current number of samples taken. So, if 300 samples gives 95% confidence, what is the confidence at the moment it has taken 100 samples? I've found little information on this.

Need: formula for confidence level. Given-

- Population size (this changes based on settings)
- Number of samples taken
- Sample statistics (std, mean, etc)

The end result here is a function which takes the confidence and algorithm performance at a moment and returns a probability to short-circuit the sampling process (the compression process.) Meaning, if it's performing very badly with some settings, it can quit sampling early to move to new settings (cut its losses); while if it's performing well with some settings, it will collect the whole sample (don't argue with the gift horse.)

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger