Normal distribution

Reading time

In statistics, the normal distribution is one of the most important and commonly used probability distributions. It is also known as a bell curve, the natural distribution or Gaussian distribution, in honor of the mathematician Carl Friedrich Gauss who studied its properties in detail.

The normal distribution is characterized by its symmetrical bell shape, which means that most values cluster around the mean, and values move away from the mean as they become larger or smaller. The normal distribution is defined by two parameters:

Mean (µ): This is the center of the bell, representing the value around which the other values cluster.

Standard deviation (σ): This is a measure of the dispersion of values in relation to the mean. The greater the standard deviation, the greater the dispersion of values.

The probability density function of the normal distribution is given by the following mathematical formula for a random variable:

$f(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}}$

This distribution has several important properties:

Symmetry: The distribution is symmetrical with respect to its mean.

Bell shape: Most values are close to the mean, and the probability of extreme values decreases rapidly as you move away from the mean.

68-95-99.7 Rule: Approximately 68% of values lie within one standard deviation of the mean, 95% within two standard deviations and 99.7% within three standard deviations.

The normal distribution is used in many areas of statistics, including statistical inference, modeling and hypothesis testing, due to its well-known mathematical properties and its frequency of occurrence in many natural and experimental phenomena.

Reduced normal distribution

The "normal centered reduced" distribution refers to a standardized normal distribution, i.e. a normal distribution with a mean of 0 and a standard deviation of 1. This is one of the most commonly used distributions in statistics.

In fact, any normal variable can be transformed into a reduced centered normal by subtracting the variable's mean and dividing by its standard deviation. This normalization is useful for comparing variables that may initially have different units or scales. It also simplifies calculations in many statistical contexts.

For a random variable X following a normal distribution with mean μ and standard deviation σ. The normalization of X to obtain the reduced centered normal (often noted Z) is done using the formula :

$Z=\frac{X-\mu}{\sigma}$

The Z value represents the number of standard deviations from the mean. It can be positive or negative.

A value of Z=2, means that this point is above the mean µ and the offset from the mean is 2 standard deviations σ.

A value of Z=-3.5, means that this point is below the mean µ and the offset from it is 3.5 standard deviations σ.

With this transformation, we can use the table of the reduced centered normal distribution. This table is used to determine the values of the distribution function of the normal distribution F(x) as a function of the value of Z.

$F(Z)=\int_{-\infty }^{Z}\frac{1}{\sqrt{2\Pi}}e^{-\frac{u^{2}}{2}}$

With :

F(Z) : The distribution function of the standard normal distribution. It is a mathematical function that gives the probability that a random variable following a standard normal distribution is less than or equal to a given value.

𝐹(𝑍)=𝑃(𝑧 ≤ 𝑍)

The value of F(Z) is always between 0 and 1, as it is a probability.

The values of the distribution function F(Z) for the standard normal distribution are used in many areas of statistics to perform probability calculations, including hypothesis testing, confidence intervals, non-conformance rate estimation, process reliability estimation and other statistical analyses.

The distribution function F(Z) cannot be expressed in terms of elementary functions (such as polynomials, exponentials or trigonometrics) and often requires the use of statistical tables or computer software to calculate the probability values associated with specific values of Z. In the case of the normal distribution, the reduced-centered normal distribution table, also known as the Z table, is used to calculate F(Z) :

Example:

Find the values of the following probabilities using the normal distribution:

𝑃(𝑧≤0), 𝑃(𝑧≤-2), 𝑃(𝑧≥1.55), 𝑃(-2≤ 𝑧 ≤1.55)

Solution:

Probability

𝑃(𝑧≤0) = 0.5Pz ≤ 0 = 0.5

𝑃(𝑧≤-2)=𝑃(2≤𝑧)=1-𝑃(𝑧≤2) = 1-0.9772=0.0228

𝑃(𝑧≥1.55) = 1-𝑃(𝑧≤1.55)= 1-0.9394 = 0.0606

𝑃(-2≤𝑧≤1.55) = 𝑃(𝑧≤1.55)-𝑃(𝑧≤-2) = 0.9394-0.0228 = 0.9166

Calculating the percentage outside tolerance

As discussed when establishing the characteristics of the normal distribution, it is fully characterized as soon as its mean and standard deviation are known. More specifically :

Around 68.27% of observations fall within one standard deviation of the mean.

Around 95.45% of observations fall within two standard deviations of the mean.

Around 99.73% of observations fall within three standard deviations of the mean.

These percentages describe how the data are distributed around the mean in a normal distribution, providing valuable indications of the dispersion of values in relation to the mean.

However, to assess more precisely the percentage of elements outside the tolerated limits in a population, it is possible to use the z-number calculation.

The number z is calculated as follows:

$Z = \frac{\mu-\text{tolerance}}{sigma}$

It represents the measurement in terms of standard deviations between the sample mean value and the tolerance limit.

Once the number z has been determined, it is possible to calculate the percentage of elements outside tolerance by referring to the Gaussian table or the centered reduced normal distribution table. This table allows you to find the proportion of values beyond a certain distance (represented by the number z) from the mean in a normal distribution, which helps to assess the percentage of elements outside tolerated limits.

Example:

Find the total out-of-tolerance percentage, knowing that the mean diameter is µ=10.1mm and the standard deviation σ=0.5mm and the tolerance interval IT=[9; 11].

Let's calculate the z_min:

$Z_{min} = \frac{\mu-\text{tolerance}}{\sigma} = \frac{10.1-9}{0.5} = 2.2$

The percentage of out-of-tolerance parts min in the Gauss table can be deduced from this:

% HT min = 100% - 98.61% = 1.39%

Let's calculate the z_max:

$Z_{max} = \frac{mu-\text{tolerance}}{sigma} = \frac{10.1-11}{0.5} = 1.8$

The percentage of out-of-tolerance parts max in the Gauss table can be deduced from this:

% HT max =100%-98.61% = 3.59%

We therefore deduce the total out-of-tolerance percentage :

% HT= % HTmin +% HTmax

% HT = 1.39%+3.59%≈5%

Educational resources

Normal distribution

Reduced normal distribution

Calculating the percentage outside tolerance