Sample Size Verification Method using Statistics without Probability

SWOP sample size required is proportional to the variability of the variables in the regression model. Variability is a surrogate of noise in the data. According to simulations noise is precisely linearly related to the Q-value. The Q-value is used for hypothesis testing with higher Q-values denoting more statistically significant effects (I.e. regardless of magnitude or direction a persisting effect despite noise). Therefore Q-value is proportionate to sample size required. But Q-value cannot be determined until adequate sample size is reached. Similarly effect size stabilisation occurs when adequate sample size is reached. Therefore check Q value and effect size stabilisation as sample size increases with improved robustness.

The following method can be used to determine whether more data is needed and this method works as data is collected rather than before data collection.

Sample Size-Method
1. Randomly pick subsets of data of increasing size. All possible combinations of subsets of data of increasing size can be done as well.

2. Measure fluctuations in the point estimate.

3. The point estimate and other statistics mentioned in this paper should increase in precision as the subsets of data increase in size.

4. Once the change in fluctuations decreases to below a cutoff as the sample size of subsets increase the sample size of total data is sufficient.

5. If fluctuations of point estimate do not settle then more data is needed.

You can repeat this method using simulated data with similar effect sizes to your data to find when the sample size is large enough so that the point estimator, interval estimator and hypothesis testing values do not fluctuate meaningfully.

Beware of selection bias

Note: Statistics Without Probability overcomes the Jeffreys-Lindley Paradox

The Jeffreys-Lindley Paradox states that as sample size increases greatly the P-value is always significant given any alpha. However, as sample size increases greatly the Bayesian Factor becomes non-significant i.e. greater than 1. (Shafer1982Lindley)

In SWOP increased sample size is always better than less sample size provided the data is of sufficient quality with regards to measurement error and selection bias, but each additional data point leads to a reduced return in accuracy once a sufficient sample size is reached. However, the overall accuracy of the hypothesis test only increases with increasing sample size defeating the Jeffreys-Lindley Paradox. (Shafer1982Lindley)

Hypothesis testing in SWOP does not depend on probability values. A sufficient sample size is where the point estimate and interval estimate do not fluctuate below a certain margin.

Glenn Shafer. Lindley’s paradox. Journal of the American Statistical Association, 77(378): 325–334, 1982.

%d bloggers like this: