I'm working on a statistical stream (time-series) compression algorithm which optimizes its performance as it processes data. It would benefit from stats to help it determine the sample size it should use at any moment. I've found a few formulas (e.g. Slovin's), but I'm not sure what would work best.
Need: formula for sample size. Given-
We'd also like the algorithm to weight its current performance at any moment by its confidence in the current number of samples taken. So, if 300 samples gives 95% confidence, what is the confidence at the moment it has taken 100 samples? I've found little information on this.
Need: formula for confidence level. Given-
The end result here is a function which takes the confidence and algorithm performance at a moment and returns a probability to short-circuit the sampling process (the compression process.) Meaning, if it's performing very badly with some settings, it can quit sampling early to move to new settings (cut its losses); while if it's performing well with some settings, it will collect the whole sample (don't argue with the gift horse.)