Central Tendencies in Statistics
Central tendency is defined as “the statistical measure that identifies a single value as representative of an entire distribution.” It aims to provide an accurate description of the entire data. It is the single value that is most typical/representative of the collected data. It lets us know what is 'normal' or 'average' for a set of data.
The central tendency also allows us to compare one data set to another. For example, let's say we have a sample of girls and a sample of boys, and we are interested in comparing their heights. By calculating the average height for each sample, we could easily draw comparisons between the girls and boys. Central tendency is also useful when we want to compare one piece of data to the entire data set.
For example, when a person writes an online exam and scores 60% which is below average. After coming out of the exam hall, he/she find out that the average score of the students is 45%. When the grading is done, he/she will be on the highest rank. The student tends to get upset if you had not known the central tendency here.
Types of central tendencies
There are three basic measures of the Central tendency. Mean, median, and mode. Each of these measures describes a different indication of the typical or central value in the distribution.
Mean is simply the average of the data. To calculate the mean of the data, we add up all the values in the data and divide it with the total number of values in the data. It is the preferred method of central tendency as it considers all the values in the data set.
Let us take the example of marks scored by 20 students of class A in an exam out of 25, 21 , 24, 25, 23, 22, 26 , 25,18 ,16, 24, 25, 20, 20, 25, 24 ,21, 17, 20, 21, 25
To calculate the mean, we take the sum of the values and divide it with total observations. Here the sum is 442 and the average or the mean is 22.1
Now, let us take another example of marks scored by 10 students of class B in an exam out of 25 24, 23, 22, 25, 18, 25, 20, 25, 21, 21 To calculate the mean, we take the sum of the values and divide it with total observations. Here the sum is 224 and the average or the mean is 22.4
From the above, we can see that even though the total marks obtained by all the students is more in class A, the average mark is less than that compared to class B.
Let us now take 2 subsets of the values with 5 marks in each set. The main data is called the population and this subset is called a sample.
The first subset is 21 ,24, 25, 23, 22 , the mean of the data is 115/5= 23
The second subset is 21, 17, 20, 21, 25, the mean of the data is 104/5=20.8
We can see here that the mean population mean need not be equal to sample mean.
This way, mean is used to find value around which the total data is distributed.
Pros and Cons of Mean:
The mean is the preferred measure of central tendency because it considers all of the values in the data set. . It is to be noted that the mean of a dataset need not be an observation in the dataset.
The data must be numerical to find the mean. This means that the mean cannot be calculated for categorical data like data on characteristics like gender, appearance, and race. The outliers will have a high impact on the mean of the data.
The Median is the middle value of the observations in the dataset. To find the median, we need to arrange the data in either ascending or descending order, calculate the middle values. If the number of observations is odd, we can directly take the middle value but if the number of observations of the data is even, we need to take the average of the two middle values.
If we look at the marks in the above example, To find the median, we need to arrange the values in ascending order 16, 17, 18, 20, 20, 20, 21, 21, 21, 22, 23, 24, 24, 24, 25, 25, 25, 25, 25, 26 As the number of observations is even, we need to take the average of the middle values, here it is (22+23)/2 = 22.5
Pros and Cons of Median
The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency when the distribution is not symmetrical.
The median cannot be identified for categorical nominal data, as it cannot be logically ordered. Median does not take into account the precise value of each observation and hence does not use all information available in the data. Unlike mean, the median is not amenable to further mathematical calculation and hence is not used in many statistical tests
The mode is the number in a data set that occurs most frequently. We can count the number of times each number occurs in the data set. The mode is the number with the highest tally. We can have more than one mode for a data.
In the data of the above example, we see that the value that is occurring for the most number of times is 25. Hence the mode here is 25.
Pros and Cons of Mode:
The main advantage of mode is it is easy to calculate. It is not affected by outliers. It can be used for categorical data also. Mode is not based on all the values. We cannot find a mode when there is no repetition of data. We can have no mode or single mode or more than one mode