The central limit theorem tells us:
Any system, resulting from the sum of many factors independent of each other and of an equivalent order of magnitude, generates a distribution law that tends towards a normal distribution.
But we can also reason in the opposite way. If we observe a non-normal distribution, then one of the theorem's hypotheses is invalid:
- Case 1: the system is not the sum of many factors: it may be the product of many factors or other. In this case, the distribution law may be different, and in general a transformation (taking the log of the result, for example) will restore a normal distribution.
- Case 2: The factors are not independent of each other
- Case 3: The factors are not of the same order of magnitude :
One factor outweighs the others. In this case, we need to find the factor in question, as it alone generates a significant source of variability.
An outlier is polluting the distribution. In this case, we need to find the cause of the outlier and eliminate it if the cause can be explained.
In both cases, it is not necessary to find a distribution law corresponding to the observed variability. In fact, this distribution law will not be repeatable over time, as it is due to a single parameter, and will therefore have no predictive properties.
If the origin of the non-normality is due to case 1, it's time to find the corresponding distribution law, especially if you want to predict the percentage of values outside tolerance. To do this, you can use software such as Data Analysis module of Ellistat and check the proposed distributions at the bottom of the window to see if one of the distributions can account for the observed data.