I am a believer in practical applications rather than textbook readoffs, so I intend to explain the topic without a textbook approach. I won’t be explaining simple terms like data, etc. and you can Google the definitions. Hope you don’t find it too unorthodox.
Statistics is all about collecting, analyzing, interpreting and representing data. This representation is often in the form of estimations and statistics, unlike physics or chemistry, is more of an approximation. Data can be anything like numbers, names, colours or whatever thing you’re working on. For calculations, it is better to change raw data into frequency data.
Raw data is data that is not arranged in any manner. For example, take this set of numbers:
1, 4, 5, 6, 9, 9, 9, 6, 5, 4,
Why is it raw? Because it is just lying there, no efforts have been made to arrange it and make it more presentable. If I transform this data into a frequency one, it would be like:
A single “1”, two “4”, two “5”, two “6” and three “9”. In statistical terms, “1” has a frequency of one and “9” has a frequency of three.
Frequency means number of occurrences. This means that the number “1” has occurred once and the number “9” has occurred thrice. Often we use a variable to denote data. For example, the variable “X” could have been used instead of the word “numbers” and a notation be given underneath, this would make things simpler.
Now we can move on to the measures of central tendency. Central tendency is actually a specimen taken from the data to clarify what the rest of the data tends to be. Different measures of central tendency have different outcomes, but they all represent the orientation of data. How? Let’s see.
The first and the easiest measure is called the mode. Any member of the data which occurs the most is called the mode. In our example of numbers, the mode is 9 because it occurs most (three times). When data is raw, we have to manually count the occurrences of all elements from our data and declare the highest to be mode. This is somewhat tedious and is why we use frequencies. With our data sorted into frequencies, we only need to see which data has the highest frequency and declare it to be the mode. Highest frequency means highest number of occurrences. Data can have no modes at all, this is when there are too many similar frequencies or none is higher. Having two or three modes is generally accepted, so if we have two numbers with same frequency, we can declare both to be mode.
Next comes the median. Median is the element corresponding to the middle or central position of the data. First step is to count the number of data. This is again easier when having frequency data because the sum of frequencies is actually the total number of data. How? Add all occurrences and we get the total number of data.
- To get the middle position, add “1” to the total number of data/frequencies and divide by “2”. The resultant number will be our middle position.
- Arrange the data either in ascending or descending order. This is necessary because to make the middle position correspond to the right element, data needs to be ordered.
- If data is not in the form of frequencies, count the data from the start until you stumble upon the one that corresponds to the middle position. If data is in the form of frequencies, add first frequency to the second and so on until this adding sum of frequencies is equal to or is just greater than our middle position. The element that corresponds to the frequency responsible for taking our sum to the middle position is our median.
- If we have our middle position in the form of decimals like “6.5”, then we will have to find the element corresponding to the 6th and 7th position, add them up and divide by “2”. This averaging will approximately give us the element corresponding to the 6.5th position.
Mean is what the general population calls average, but it is different in statistical terms. All three measures of central tendency are averages, they are just different ways of averaging. The basic formula for mean is:
- Sum of data divided by number of data.
So we add up all the data and divide it by their number. There’s a slight change when the data are arranged with frequencies. We multiply each element with its related frequency and sum them all up. This is because if we have three 9s, then adding them up give 27, but if we have them arranged by frequency, we only need to multiply 9 by its frequency of 3 and get 27. Frequencies make all this easier. So once we have multiplied the elements with their frequencies, we add them up and divide by the sum of all frequencies (that’s the number of data actually).
These were the very basics of statistics and will be helpful in high school or college. Hope statistics turns out to be easy.