Grouped Data and Class Intervals
Organise large data sets into class intervals, estimate the mean using midpoints, find the modal and median classes, and read a frequency table, with worked examples.
Key takeaways
- Grouping data into class intervals tames large data sets but loses exact values
- Estimate the mean using midpoints: total of (midpoint Γ frequency) Γ· total frequency
- The modal class has the highest frequency; the median class contains the middle value
When there is too much data
Imagine recording the height of 200 students. A frequency table listing every single height β 161 cm, 161.5 cm, 162 cm β would be enormous and hard to read. Instead we sort the data into class intervals, such as 150β160 cm and 160β170 cm. This is grouped data: easier to handle and to chart, at the cost of losing the exact values.
Reading class intervals
A class interval is a range of values written with inequalities to avoid overlap. Consider the time, in minutes, that 30 students spent on homework.
| Time, t (minutes) | Frequency |
|---|---|
| 0 β€ t < 10 | 4 |
| 10 β€ t < 20 | 11 |
| 20 β€ t < 30 | 9 |
| 30 β€ t < 40 | 6 |
The notation 10 β€ t < 20 means "10 or more, but less than 20". A student who spent exactly 20 minutes falls in the next class (20 β€ t < 30), not this one. This careful boundary keeps every value in exactly one class. Total frequency = 4 + 11 + 9 + 6 = 30.
The modal class
Because we no longer have individual values, we cannot give a single mode. Instead we give the modal class β the interval with the highest frequency.
The biggest frequency is 11, so the modal class is 10 β€ t < 20 minutes.
Estimating the mean with midpoints
We cannot find the exact mean, because we do not know each student's precise time. The standard method is to assume every value sits at the midpoint of its class. The midpoint is halfway between the class bounds.
| Time, t | Midpoint (x) | Frequency (f) | f Γ x |
|---|---|---|---|
| 0 β€ t < 10 | 5 | 4 | 20 |
| 10 β€ t < 20 | 15 | 11 | 165 |
| 20 β€ t < 30 | 25 | 9 | 225 |
| 30 β€ t < 40 | 35 | 6 | 210 |
| Total | 30 | 620 |
Now apply the formula:
Estimated mean = total of (f Γ x) Γ· total frequency
Estimated mean = 620 Γ· 30 = 20.7 minutes (to 1 decimal place).
This is an estimate, because the midpoint assumption is rarely exactly true. We always write "estimated mean" for grouped data, never just "mean".
The median class
With 30 values, the median sits around the 15thβ16th value. We find which class it falls into using a cumulative frequency (a running total).
| Time, t | Frequency | Cumulative frequency |
|---|---|---|
| 0 β€ t < 10 | 4 | 4 |
| 10 β€ t < 20 | 11 | 15 |
| 20 β€ t < 30 | 9 | 24 |
| 30 β€ t < 40 | 6 | 30 |
- The first 4 values are in class 1.
- By the end of class 2 we have reached the 15th value.
- The 16th value falls in the next class.
The 15th value lands at the top of the second class and the 16th in the third, so the median lies right at the boundary β the median class is best given as 20 β€ t < 30 minutes (where the middle of the data sits once we pass position 15). When asked only for the median class, identify the interval containing the middle position.
Activity: measure and group
- Collect 20β30 values, such as the length of each classmate's shoe in centimetres, or reaction times.
- Choose sensible equal class intervals (for example widths of 2 cm) and tally each value into one class.
- Identify the modal class.
- Build midpoint and f Γ x columns and calculate the estimated mean.
- Use cumulative frequency to find the median class, and discuss why your mean is only an estimate.
Why this matters
Grouped data is how real surveys, censuses and scientific studies summarise thousands of measurements, and the midpoint method is the standard way to estimate an average from them. The trade-off β convenience for a small loss of accuracy β is a key idea in statistics. Strengthen the underlying skill with averages from a frequency table, and revisit the core averages in mean, median, mode and range.
Quick quiz
Test yourself and earn XP
For the class interval 10 β€ x < 20, what midpoint do you use to estimate the mean?
The midpoint is halfway between the bounds: (10 + 20) Γ· 2 = 15.
Why is the mean from grouped data only an ESTIMATE?
Grouping discards the exact values, so we assume each item sits at its class midpoint, giving an estimate rather than the true mean.
Which is the modal class if frequencies are: 0β10 has 4, 10β20 has 11, 20β30 has 6?
The modal class is the interval with the highest frequency, which is 10β20 with 11.
With 30 data values, which position do you locate to find the median class?
The median lies at the middle, around the 15th value (between the 15th and 16th), so you count cumulative frequencies to reach it.
FAQ
Grouped data is data sorted into class intervals such as 0β10 and 10β20, rather than listing every individual value. It makes large data sets manageable.
Once data is grouped, the individual values are lost. We assume each value sits at the midpoint of its class, so the mean we calculate is an estimate.
Keep exploring
More in Math