Confidence interval

Reading time

A confidence interval is a plausible range of values for a statistical parameter, estimated from a sample of data. It gives an idea of the precision of our estimate of the parameter. The confidence interval is usually expressed with an associated confidence level, which represents the probability that the interval actually contains the true population parameter. 

Average : 

The confidence interval of the mean is a statistical interval that gives a plausible estimate of the interval within which the true mean of a population lies. It is constructed from data from a sample of this population.  

Certainly, the creation of a confidence interval for the mean is possible thanks to the central limit theorem. For sufficiently large samples (n≥30), whatever the shape of the population distribution, if several samples of size "n" are taken at random, the means of these samples \left{overline{X} \right} are approximately normally distributed. This makes it possible to construct reliable confidence intervals for estimating the true population mean. 

To construct a confidence interval for the mean, we use either Student's t-distribution or the normal distribution, depending on the sample size and our knowledge of the standard deviation of the population. 

Since this calculation is an approximation, we need to know the accuracy of this approximation. In general, to characterize the accuracy of this approximation, we calculate the interval at 95%. This interval corresponds to : 

Interval at 95% = Interval in which there is a 95% chance that the true mean value of the distribution lies within it. 

In statistics 95% is called confidence (1- α), and is complementary to the first-species risk α=5%. This risk represents the chance that the value of the mean of the distribution lies outside the confidence interval. 

Here's how to construct a confidence interval for the mean: 

  1. Calculate the mean and standard deviation of sample size "n": Using the sample data, calculate the mean and standard deviation of the sample. \overline{X} and S. 
  2. Choice of confidence level (1- α): Select a confidence level, often expressed as a percentage, such as 95% or 99%. A confidence level of 95% means that we are 95% confident that the interval we are constructing will contain the true population mean. 
  3. Determining the interval: Use the confidence interval formula for the mean according to the appropriate distribution (Student or normal): 
    • If you know the population standard deviation 𝜎, use the normal distribution:
      • IC = \overline{X} \underline{+}Z_{\frac{a}{2}}\ast \frac{S}{\sqrt{n}} where:
        • Z\frac{2}{a} is the z-score corresponding to the confidence level. (Bilateral)
        • n : is the sample size. 
    • If you don't know the standard deviation of population 𝜎, use the Student distribution: 
      • IC = \overline{X} \underline{+}t_{\frac{a}{2}n-1}\ast \frac{S}{\sqrt{n}} where:
        • t_{refrac{a}{2}n-1}is the t-score corresponding to the confidence level and for n-1 degrees of freedom.  
        • n : is the sample size. 

The confidence interval of the mean therefore gives a range of values within which we are confident at a certain confidence level (1-𝛼) that the true mean of the population µ lies. The higher the confidence level, the wider the interval, reflecting a higher degree of confidence in the estimate. 

 Standard deviation S Sample size n Confidence (1-α) 
Width of confidence interval of mean IC. IC width increases as standard deviation increases IC width decreases with increasing sample size CI width increases as confidence increases 

Example: We want to know how to calculate the confidence interval for average sugar consumption per family at 95% confidence. A sample of 18 families was taken. Below is the table of results: 

13 11 14 13 12 

Solution: 

Let's calculate the mean, standard deviation and number of degrees of freedom 

\overline{X} = \frac{5+13+11+5+2+3+2+1+6+14+6+8+2+13+9+5+12+7}{18} = 6.88

S = \sqrt{\frac{\sum_{1}^{N}(xi-\overline{x})^{2}}{17}} = 4.25

n-1 =17

From the Student's distribution table, or with the software Ellistat Data Analysiswe find the value t=2.110

An image containing text, diagram, screenshot, automatically generated TraceDescription

We can therefore deduce the following confidence interval: 

\overline{X}-t_{\frac{a}{2};n-1}\ast \frac{S}{\sqrt{n}}le \mu\le \overline{X}+t_{\frac{a}{2}n-1}\ast \frac{S}{\sqrt{n}}

6.88-2.110\ast \frac{4.25}{\sqrt{18}}le \mu\le 6.88+2.110\ast \frac{4.25}{\sqrt{18}}

4.773 \mu\le 9.005

Variance / Standard deviation:

To construct a confidence interval for the variance of a population, we use the chi-2 distribution (x^{2}). In fact, we know that the variance is estimated using the following formula: 

</p><p>The chi-2 formula ([latex]x^{2} of the variance is written as follows: 

X^{2} = \frac{(n-1)S^{2}}{sigma^{2}}

The chi-2 density function curve (x^{2}) resembles a normal distribution, but is not symmetrical. Above all, its shape depends on the number of degrees of freedom. The diagram below shows the chi-2 density function (x^{2})for a degree of freedom of n=4 . 

The x^{2} can be used to deduce the confidence interval of the variance 𝜎², for sample size n and confidence 1-α. 

\frac{(n-1)S^{2}}{X^{2}_{n-1;\frac{a}{2}}}\le \sigma^{2}\le \frac{(n-1)S^{2}}{X^{2}n-1;1-\frac{a}{2}}

Process for calculating the confidence interval of the variance : 

  1. Calculate the variance and degrees of freedom: From the sample data, calculate the variance S², and the degrees of freedom (n-1). 
  2. Find critical chi-squared values: Find critical chi-squared values X^{2}n-1;\frac{a}{2}\text{et}X^{2}n-1;1-\frac{a}{2} for the desired confidence level and degrees of freedom. You can find these values in 𝜒2 distribution tables or with the help of Ellistat. 
  3. Use the following formulas to determine the confidence interval of the variance: 

\frac{(n-1)S^{2}}{X^{2}_{n-1;\frac{a}{2}}}\le \sigma^{2}\le \frac{(n-1)S^{2}}{X^{2}n-1;1-\frac{a}{2}}

NB: the confidence interval of the standard deviation can be deduced in this way, by placing the root on either side. 

\sqrt{\frac{(n-1)S^{2}}{X^{2}n-1;\frac{a}{2}}}\le \sigma\le \sqrt{\frac{(n-1)S^{2}}{X^{2}n-1;1-\frac{a}{2}}}

Example: a sample of 10 cylinders has been taken from production. We want to get an idea of the process variability. Determine the confidence interval of the variance 𝜎2 for a confidence of 95% : 

10 10 12 10 11 
10 11 11 10 11 

Solution:  

Let's calculate the standard deviation S and the number of degrees of freedom : 

S^{2} = \frac{\sum_{1}^{N}(xi-\overline{x})^{2}}{9} = 0.489

n-1=9 

For a confidence (1-α) of 95%, we can deduce the values of the quantiles used in calculating the confidence interval of the variance: 

\frac{\alpha}{2}=0.025\text{ and } 1-\frac{\alpha}{2} = 0.975

From the 𝜒2 distribution table, or with the Ellistat software, we find the value of   X^{2}{9;\frac{a}{2}}=19.02\text{ et }X^{2}{9;1-\frac{a}{2}}=2.70

We can therefore calculate the confidence interval of the variance at a confidence of 95%. 

\frac{(n-1)S^{2}}{X^{2}_{n-1;\frac{a}{2}}}\le \sigma^{2}\le \frac{(n-1)S^{2}}{X^{2}n-1;1-\frac{a}{2}}

\frac{90.489}{19.02}\le \sigma^{2}\le \frac{90.489}{2.70}

0.231 \sigma^{2}\le 1.629

0.480 \sigma\le 1.276

Proportion

The confidence interval of a proportion is a range of values within which it is estimated that a proportion of a given population is likely to lie, with a certain probability. In other words, it's an interval of values constructed from sample data, within which the true proportion of the population is estimated to lie, with a given level of confidence. 

There are various methods for calculating the confidence interval of a proportion in statistics, but the two most commonly used are : 

  • Exact method (for small sample sizes). 
  • Approximate method (with normal distribution) 

Exact method(calculated using the binomial distribution) 

The exact method for calculating the confidence interval of a proportion is based on the binomial distribution, and provides a precise solution without the approximations made by asymptotic methods. This is particularly useful for small sample sizes or when the observed proportion (

𝑝ˆp^

) is close to 0 or 1. 

Here are the steps for calculating the exact confidence interval of the proportion: 

Step 1: Calculate the proportion observed on sample n with k successes.𝑝ˆ=𝑘𝑛p^=kn Determine the bounds of the confidence interval .

Step 2Calculate the quantiles of the binomial distribution. These quantiles delimit the confidence interval. For a confidence level of 1-α, you need to find the quantile Q1 at percentile 𝛼2𝛼2and then the quantile Q2 at percentile 1-𝛼21-𝛼2 from the binomial distribution table. These quantiles can be found using binomial distribution tables or in Ellistat software. 

Step 3: Calculate the confidence interval: The confidence interval is calculated using the following formula: Then calculate the confidence interval [𝑄1𝑛;𝑄2𝑛][Q1n;Q2n].  

Example: suppose that after taking a sample of size n=20, you have observed k=15 conforming. Calculate the exact confidence interval of the proportion of conforming parts for a confidence level of 95)? 

Solution: 

The proportion of conforming parts observed: 𝑝ˆ=1520=0.75p^=1520=0.75

Determination of Q1 and Q2 , for a proportion p=0.75 and a sample size of 20. 

Using Ellistat software, we find: Q1=11 (11 gives a 𝛼/2 closest to 0.025) and Q2=18 

An image containing text, screenshot, diagram, automatically generated lineDescription

An automatically generated image containing text, diagram, line, plotDescription

The confidence interval of the proportion for a confidence of 95% is: [𝑄1𝑛;𝑄2𝑛]= [1120;1820]Q1n;Q2n= [1120;1820]

It's important to note that this method gives an accurate solution, but it can be more computationally intensive, especially for large sample sizes, often requiring the use of statistical software to perform the calculations. 

Approximate method (with normal distribution):  

To construct a confidence interval for a proportion in a population, we use the normal distribution if the conditions of the central limit theorem are satisfied. Indeed, if a sample of size n is taken from a population that follows the Binomial distribution of parameter p. The proportion calculated from this sample is p^, knowing that: 𝑝ˆ=𝑥𝑛p^=xn

With :  

  • x : number of successes. 
  • n: sample size. 

The population mean and standard deviation are: 𝜇𝑝ˆ=𝑝

𝜎𝑝ˆ=𝑝 (1-𝑝)𝑛‾‾‾‾‾‾‾‾‾√𝜎p^=p (1-p)n

The limited central theorem can be applied to the proportion of samples if 𝑛∗𝑝≥5n∗p≥5 and 𝑛∗(1-𝑝)≥5n∗(1-p)≥5. Indeed, this is particularly useful for large sample sizes or when the observed proportions are not close to 1 and 0. 

The Z-score formula can therefore be applied: 𝜎𝑝ˆ=𝑝 (1-𝑝)𝑛‾‾‾‾‾‾‾‾‾√𝜎p^=p (1-p)n

If0≤𝜇𝑝ˆ±2𝜎𝑝ˆ≤10≤𝜇p^±2𝜎p^≤1 , we can consider that 𝑝ˆp^ approximately follows a normal distribution.