Video tutorial

2 populations comparison

Linear regression

Multiple linear regression

Various statistical tests

A statistical test (or testing a hypothesis) consists of detecting significant differences:

  • Between a studied and a target value (Comparison test of a theoretical value or a conformity test).
  • Between two populations (Comparison test of a population or homogeneity test)
  • Concerning the linking of two variables (correlation or association test)
  • With respect to data compatibility in relation to a distribution law (adequacy test)

From a data sampling, the statistical test will calculate the probability of obtaining a certain sampling configuration by assuming that the data is:

  • Compliant with the target in the case of a comparison test for a theoretical value
  • Homogeneous in the case of a population comparison test
  • Perfectly associated in the case of a correlation test
  • Compliant with a distribution law in the case of an adequacy test.

This hypothesis is called a null hypothesis because it assumes that there is no difference between the data.

Here are the statistical tests that are mainly used:

Case studies Parametric tests(hypothesise from a distribution law) Non-parametric tests(Does not make a hypothesis from a distribution)
Comparison with a theoretical value
Equality of a frequency to a value Test 1 P
Average equal to a value Theoretical test z

Theoretical test t

Run test

Sign test

Population comparison
Comparison of two paired populations Paired t test Paired Wilcoxon test

Sign test

Comparison of the placement of 2 populations Test z

Test t

B to C

Mann Whitney test

Comparison of the placement of k populations ANAVAR Krustal-Wallis test
Comparison of two frequencies Test 2P
Correlations
Correlation of 2 variables R²  and student coefficient Spearman Coeft

Kendal t Coeft

Correlation of k variables with a Y Multi-linear regression

Population comparison :

These tests allow for the comparison of several populations containing quantitative measurements among themselves.  For example, batches produced by two different machines, the grades for different classes, etc…

Example 1 : Does the red machine produce at a higher mean than the blue machine?

Exemple1

Example 2: You have take a sample of the grades in different maths classes.  Are the grades of the different classes homogeneous on average and by variant?

Exemple 2

Frequency test

The frequency tests make it possible to compare the proportion of appearances of a phenomenon among several batches.  For example, comparison of the proportion of defects between one production configuration and another.

Example 3: You received two batches from two different suppliers.  With the data that you have available, can you tell if supplier A is significantly better than supplier B?

Exemple 3

Example 4: According to the following results, is there a machining configuration that will significantly reduce the incidence of burs?

  P = 0,02 P = 0,04 P = 0,06
Without burs 25 22 35
With burs 6 2 1
Limit 4 3 0

 

Comparison test of a theoretical value:

The comparison tests of a theoretical value enable the comparison of a population with a theoretical value.

Example 5: After measuring the following neutrino speed, can we say that they move at a speed significantly higher than the speed of light which is 299,000 km/s?

Exemple 5

Example 6: Let us assume that there are 50% women in the population.  In a company with 952 people, 440 are women and 512 are men.  Is this a significant difference?

Correlation test:

Correlation tests make it possible to verify if two quantitative variables seem linked.

Example 7: You measured the strength of a spring at a breaking point compared to the pressure at which it was produced. Does the pressure have an influence on the resistance of the spring?

Exemple 7

Multiple linear regression test:

Also called large table analysis…This analysis enables you to find the influential factors on your Y when you have a large table of data containing Y’s as functions of X on each line.

Example 8: You want to maximize result A in regards to different parameters that you have highlighted.  What are the significant factors and how can the result A be maximized:

Exemple 8