SHARE THE ARTICLE ON
Descriptive Statistics, as the name suggests, is used to summarize or describe the data set. As per the data sets, it is a set of observations or responses that are gathered from a population or a sample of a population. Where a population is an entire group from whom data is collected, a sample is much lesser than a population and generally represents an entire population. Now, the population doesn’t always have to be dealing with people, rather it can be any other entity such as companies, cities, countries, etc.
As for descriptive statistics, as it is quantitative research while working on the datasets there are various statistical operations used on variables such as mean, standard deviation, frequency, etc. This descriptive statistics in turn helps us describe the characters of a variable and the relations between them.
You are taking a survey about what genre of books a population likes.
This will give you all sorts of genres but you will have to prepare a visualization of it, to simply get a readable picture showing how many percentages of people like what genre. Well, as we talk about a graphical representation, it can be a bar graph, pie chart, etc. with various colour codes and labels.
For the next step and an application for the descriptive statistics, there are inferential statistics that help us conclude that our hypothesis matches the data sets and check whether we can generalize it over to a larger population.
There are statistical ways to
It indicates how many times a particular value occurs in a data set. This can be represented using graphs and tables and can be counted in numbers or percentages.
Example 1: A survey of 20 people choosing their favourite book genre. The frequency distribution table would look something like this;
These tables indicate all the genres and the number of times they occurred in the survey. This is called Simple Frequency Distribution.
Example 2: A survey of students’ marks in mathematics. The frequency distribution table would look something like this;
The above tables give us the range of marks of students in mathematics out of 100 and their corresponding percentage of students scoring that many marks. Let’s say for the marks ranging from 51-75 there are 50% of students who scored between 51-75. This is called Grouped Frequency Distribution.
Focuses on calculating the centre or average of a gathered sample dataset. Let’s take the first 5 responses of a survey of people giving their age.
Calculates the average of the data. Mean is denoted by ‘M’. to calculate the mean, just add all the age responses and divide them by the total number of responses denoted by ‘N’.
Sum of all responses
Total number of responses (N)
That means, 30 years and 8 months is the average age according to the above data set.
It gives the value that is exactly at the centre of the data set. Arrange the responses in ascending order and manually find their middle.
Example 1: For the above data set (where the response is in odd count), the median would be.
Example 2: For the above data set, let’s take the response in even numbers. For this we will have two middle values, hence calculate their mean and it is the median.
Middle two values
Mean of two values
It gives the most occurred response value. A data set can have from zero to multiple modes.
Example: For the above survey there are zero modes since no age number is repeating. But if we modify it as;
Variability describes how scattered the data set is. This diversity of values in a dataset is given by the following measures of variability.
It gives a depiction of how far apart the two extremes of a range are. It is calculated by simply subtracting the lowest number in ranger with the highest one.
Example: For the above survey of ages;
It is the average amount of variability in the dataset. Concerning the mean, it shows how far each value lies from it. Hence, standard deviation represents how large the variation in the data is.
Example: For the above survey of ages.
It has six steps
Step1) Write each score and its mean
Step2) To get its deviation, Score – Mean
Step3) Square each deviation
Step4) Add all the squared deviation
Deviation from mean
6 – 39 = -33
13 – 39 = -26
23 – 39 = -16
45 – 39 = 6
80– 39 = 41
67-39 = 28
M = 39
Sum = 0
Sum of squares = 4522
Step5) Divide square of deviations by N-1
4522- (6-1) = 4517
Step6) Now find the square root of the found number.
√4517 = 67.2 is the Standard Deviation.
It is simply the square of the standard deviation. It gives about how much degree the data is scattered. Denoted by ‘s2’.
Example: For the above example of standard deviation, its variance will be;
s = 67.2
s2 = 4515
With all of those calculations, it is certain that now it would be easy to work with Descriptive Statistics and derive the necessary outcomes regarding data sets.