In my previous article on “How to Build a Frequency Table” I described the necessary columns on the table and how to fill them out. Once these columns are done they are charted into a Histogram. It would be best to re-read this article if you feel you should.
The Class Boundaries are chosen based on the data we wish to represent in the Histogram. The difference between the high and low boundary of each class is the same for each class and is known as the Class Width. This Class Width will differ from once set of data to the next. We will consider each class to have a width of one, i.e. each class is one class width wide. We will see the value in doing this in a moment.
The Frequency of each classis is the number of elements in the data set which fall between the values of the low and high values of the corresponding Class Boundary. The Relative Frequency of each class is its Frequency divided by the total number of values in the data set. This means the Relative Frequency represents the portion of data in the Class. This can also be considered the probability of a data value falling into the Class.
The Histogram is then drawn as a series of upright rectangles with the low and high Class Boundaries on the horizontal axis and the Relative Frequency on the vertical axis. The total number of rectangles will correspond to the number of Classes the data was divided into.
Now recall how to calculate the area of a rectangle, i.e. length time width. The width of each rectangle is one Class Width, and the height is the Relative Frequency, and because anything multiplied by one is itself, the result here is the Relative Frequency. So the Relative Frequency is equal to the area of the rectangle that represents the Class. Remember that the area also represents the probability of data being in that class.
Let’s now put it all together, the area of the rectangle drawn between the low and high boundary of any class has an area that is equal to the probability of data falling between the values of the Class represented by that rectangle. We can also say the area of all the rectangles between any low Class Boundary and any high Class Boundary is the probability of data falling between the values of those two Class Boundaries.
The Histogram is the first place most students get to relate the area of the Histogram to probability. It is an excellent analogy to the probability distributions they will encounter later on in their studies. I should also be pointed out the total area of all the rectangles that make up the Histogram, is equal to one. The is the same as saying the probability, of a values in the data set, falling between the low boundary of the first class and the high boundary of the last class is 100%.
The area under any probability density function is always one.