Scatter Graphs and Correlation
Plot scatter graphs, describe positive, negative and zero correlation, draw a line of best fit, and use it to make predictions. Worked examples, a table and a quiz.
Key takeaways
- A scatter graph plots paired data to reveal a relationship between two variables
- Correlation can be positive, negative or zero, and weak or strong
- A line of best fit lets you predict, but correlation is not the same as causation
Showing two things at once
Many questions in science and everyday life involve two variables: Does studying more raise test scores? Do older cars cost less? A scatter graph answers these by plotting paired data โ one variable on the x-axis, the other on the y-axis โ with one dot for each item. The pattern of the dots reveals whether the two are linked.
Plotting a scatter graph
Suppose we record, for 8 students, the hours they revised and their test mark.
| Hours revised (x) | Test mark % (y) |
|---|---|
| 1 | 35 |
| 2 | 50 |
| 3 | 45 |
| 4 | 60 |
| 5 | 65 |
| 6 | 70 |
| 7 | 78 |
| 8 | 85 |
Each row becomes one point, for example (1, 35) and (8, 85). You do not join the dots with a zig-zag line, the way you would on a line graph. You leave them as separate points and look at the overall shape they make.
Types of correlation
Correlation describes the trend in the cloud of points.
| Type | What the points do | Real example |
|---|---|---|
| Positive correlation | Slope up: as x rises, y rises | Revision hours and test marks |
| Negative correlation | Slope down: as x rises, y falls | A car's age and its value |
| Zero correlation | No pattern; scattered randomly | Shoe size and exam mark |
Correlation also has strength. If the points lie almost on a straight line, the correlation is strong. If they are loosely scattered around a trend, it is weak. Our revision data slopes clearly upward with points close to a line, so it shows strong positive correlation.
The line of best fit
A line of best fit is a single straight line drawn through the middle of the trend, with roughly equal numbers of points above and below it. It does not have to pass through any actual point, and it should follow the general direction rather than chase individual dots.
Tips for drawing one well:
- Aim for the line to pass close to as many points as possible.
- Balance the points: about half above, half below.
- It is often (but not always) sensible to make it pass near the mean point โ the point made from the mean of all the x-values and the mean of all the y-values.
Making predictions
Once you have a line of best fit, you can predict a missing value.
Worked example. Using the revision line, estimate the test mark for a student who revised 4.5 hours.
- Find 4.5 on the x-axis.
- Go straight up to the line of best fit.
- Read across to the y-axis โ you would read roughly 62%.
Predicting within the range of the data (here, 1 to 8 hours) is called interpolation and is fairly reliable. Predicting far outside the data, called extrapolation โ say, the mark after 40 hours โ is risky, because the trend may not continue.
Correlation is not causation
This is the most important warning in all of statistics. A strong correlation does not prove that one variable causes the other.
A classic example: across a summer, ice cream sales and drowning incidents rise together โ a strong positive correlation. Ice cream obviously does not cause drowning. A hidden third factor, hot weather, drives both: heat makes people buy ice cream and go swimming. Always ask whether a lurking variable could explain a correlation before claiming a cause.
Activity: collect and plot
- Collect paired data from classmates, such as arm span and height, or hours of sleep and a reaction-time test.
- Plot a scatter graph with sensible axes.
- Describe the correlation: positive, negative or zero, and strong or weak.
- Draw a line of best fit and use it to predict a value.
- Discuss whether the relationship looks like genuine cause and effect, or whether a third factor might be at work.
Why this matters
Scatter graphs are how scientists, economists and doctors test whether two things are connected, from diet and health to advertising and sales. The skill of reading a trend, predicting from it, and resisting the leap from correlation to causation is one of the most useful in all of data handling. Build the plotting skills further with the coordinate plane, and connect the line of best fit to straight line graphs and gradients.
Quick quiz
Test yourself and earn XP
On a scatter graph, points that rise from bottom-left to top-right show...
When one variable increases as the other increases, the trend slopes upward, which is positive correlation.
More hours of exercise linked to lower resting heart rate is an example of...
As exercise goes up, heart rate goes down, so the variables move in opposite directions: negative correlation.
What is the purpose of a line of best fit?
A line of best fit follows the overall trend with points balanced on each side, and lets you estimate unknown values.
Ice cream sales and drowning incidents both rise in summer. This shows that...
Both are driven by hot weather. A correlation between two things does not prove one causes the other.
FAQ
Correlation describes how two variables are related. Positive correlation means they increase together, negative means one rises as the other falls, and zero means no clear link.
No. Correlation shows a relationship in the data, but it does not prove causation. A hidden third factor may be driving both variables.
Keep exploring
More in Math