Statistical position parameters

Reading time

Statistical position parameters are measures used in statistics to locate or pinpoint the central or typical position of data in a set of values. The main statistical position parameters include : 

  • Average The average is the sum of all values, divided by the total number of values. It is sensitive to outliers as it uses all the data to calculate. 
  • Median The median is the value that divides the data set into two equal parts when sorted in ascending order. It is less sensitive to outliers than the mean. 
  • Mode The mode is the value that appears most frequently in a data set. There may be one mode (unimodal distribution) or several modes (multimodal distribution). 

These three parameters give different indications of the central position of the data, and are used to understand the central tendency of a set of values. 

The average : 

Also known as the arithmetic mean, is a fundamental concept in statistics and mathematics. It represents the position of the distribution in the space of real numbers. In statistics, the population mean is often symbolized by the Greek letter (𝜇), while the sample mean is symbolized by the letter X.

The exact calculation of the mean of an equation distribution is given by : 

\mu=\oint_{}^{}xf (x)dx

In reality, we rarely know the equation of the distribution, but we do have a series of Xn values. We therefore calculate an approximation X of the mean 𝜇 by calculating : 

X = \sum_{1}^{n}\frac{xi}{n}=\frac{\text{sum of values}}{\text{total number of values}}

  • 𝑥𝑖: ith value in the series of values 
  • n: number of measured values 

The mean value represents a central value that is used to characterize the data set. It is sensitive to extreme values, which means that a single very large or very small value can influence the mean considerably. 

ExampleSuppose you have the following numbers: 9, 9, 10, 11 and 11. The average of this sample X :

X = \frac{\text{sum of values}}{\text{total number of values}} = \frac{9+9+10+11+11}{5}

X = \frac{50}{5} = 10

The median : 

The median is a measure of central tendency used in statistics. In statistics, it is often symbolized by the letter 𝜂. Unlike the mean, which is calculated by adding all the values in a data set and dividing by the total number of values n, the median is the value in the middle of the data set when sorted in ascending or descending order. 

To find the median of a data set : 

  1. Sort the values in the dataset in ascending or descending order. 
  2. If the data set has an odd number of values, the median is the value exactly in the middle of the sorted series. 
  3. If the data set has an even number of values, the median is the average of the two values in the middle of the sorted series.  

Example case n=oddConsider the following data set: 2, 4, 7, 1, 9, 3, 5. 

  1. Sort values in ascending order: 1, 2, 3, 4, 5, 7, 9. 
  2. As this data set has an odd number of values (7 values), the median is the value in the middle of the sorted series, i.e. the fourth value, which is 4. 

Example Case n= pairIn another example with an even data set, for example: 2, 4, 6, 8, 10, 12 : 

  1. Sort values in ascending order: 2, 4, 6, 8, 10, 12. 
  2. As this data set has an even number of values (6 values), the median is the average of the two middle values, i.e. (6 + 8) / 2 = 7. 

Special features of the median

When the distribution of data is not symmetrical (for example, the distribution of salaries in France), using the average will be of little interest, as it is strongly pulled towards the side where the tail of the distribution is lengthening (if we add 10 billionaires in France, the average is likely to increase). However, if we take the median, it will have little or no impact on a very large population (30.1 million working people in France). 

Using the module Ellistat Data Analysis.

Mode: 

In statistics, the mode is the value that appears most frequently in a data set. It is the value with the highest frequency, i.e. the number of times it is repeated in the set. A data set can have one mode, several modes or no mode at all. 

The mode is particularly useful for categorical data, such as colors, vehicle types or product categories. However, it can also be applied to discrete numerical data. 

For example, consider the following data set: 

2,3,5,3,7,2,8,3 

In this set, the number 3 appears more frequently than the other numbers, so 3 is the mode of this data set. 

It's important to note that unlike the mean and median, the mode doesn't provide any indication of the dispersion or overall trend of the data, it simply focuses on the most frequent value. A data set can have a single mode (unimodal) if there is a single value that repeats more frequently than the others, or be bimodal if there are two values that are both the most frequent. (This is the case when mixing two different populations: two different suppliers, for example). 

Using the module Ellistat Data Analysis.