Interval Estimation

Interval Estimation by exclusion of data

Interval Estimation can be done without assuming underlying probability distributions for the data. For example in Statistics we have to usually assume that the error term for the regression model is normally distributed to develop a confidence interval for any regression coefficient m in y=mx+b.

The interpretation for interval estimation in SWOP is this: In the discovery of knowledge based on observation effect sizes can be relatively loose or relatively tight. When measuring the gravitational pull of the Earth in physics we may have a relatively tight effect size. In contrast measuring effect sizes in certain topics in the humanities and social sciences we may have a relatively loose effect size. This is just for example. If the interval overlaps the null effect there may be little or no effect to quantify in the study. This interpretation differs from the frequentist interpretation that in 20 such similar studies, 19 studies will have a 95% confidence interval that contains the population effect size. This is a very important difference between the interpretation of SWOP and frequentist confidence intervals.

This method bypasses such a normality assumption by generating intervals based on each point (x,y) and its influence on m.

In Yi = mXi + b, Yi and Xi are vectors of data with Real Entries.

m is the regression coefficient and is calculated via least squares. This is well known as simple linear regression.

(x,y)i denotes the ith point of the data.

(1) $\displaystyle inf_i=m_U -m_{U-i}$

In equation (1) infi is the influence of the point i in calculating m. This is defined as the following:

mU is the effect size m calculated for all points in the dataset, where U denotes the entire dataset.

mU-i is m calculated without the ith data point.

We then sort the data by values of descending infi. The set of data comprising of the top 15%  of infi is labelled as “UL” for upper limit and the bottom 15% of infi is labeled as “LL” or lower limit.

By removing the UL set of data we calculate the total upper influence statistic

$\displaystyle infTot_{UL}=\sum_{i=1}^{i=U-UL}{(m_U - m_{U-i})}$

The above excludes UL datasubset.

$\displaystyle infTot_{LL}= \sum_{i=LL}^{i=U}{(m_U - m_{U-i})}$
The above exludes LL datasubset.

The confidence intervals by excluding the influence from a 15% of data are:

$\displaystyle upd_{15}= m_U + InfTot_{UL}$

$\displaystyle lpd_{15} = m_U + infTOT_{LL}$

This method does not rely on the probability distribution of any variable Xi or Yi. It does not rely on the equation between Xi or Yi. It does not even require for the i subjects to be independent or uncorrelated. It just requires that m can be calculated reproducibly using some method like least squares.

Using this method we can generate confidence intervals at different levels from excluding less than 1% to anywhere below 50% of either end of data after sorting for infi.

Interval Estimation by exclusion of effect

By summating the positive and negative components of the influence statistic infi  we get the Positive Sum Coefficient and the Negative Sum Coefficient.

$\displaystyle PosSumCoef = \sum{m_U - m_{U-i}}$ for i where  $\displaystyle {m_U - m_{U-i}}>0$

$\displaystyle NegSumCoef = \sum{m_U - m_{U-i}}$ for i where  $\displaystyle {m_U - m_{U-i}}<0$

The Upper and Lower Confidence Intervals by a 30\% percentage effect are as below:

$\displaystyle upe30coef = {m_U}+PosSumCoef\times 0.3$

$\displaystyle lpe30coef = {m_U}+NegSumCoef\times 0.3$

This site uses Akismet to reduce spam. Learn how your comment data is processed.