๐Ÿ“ˆ
Math๐ŸŽ“ Ages 14-18Intermediate 10 min read

Scatter Graphs and Correlation

Plot scatter graphs, describe positive, negative and zero correlation, draw a line of best fit, and use it to make predictions. Worked examples, a table and a quiz.

Key takeaways

  • A scatter graph plots paired data to reveal a relationship between two variables
  • Correlation can be positive, negative or zero, and weak or strong
  • A line of best fit lets you predict, but correlation is not the same as causation

Showing two things at once

Many questions in science and everyday life involve two variables: Does studying more raise test scores? Do older cars cost less? A scatter graph answers these by plotting paired data โ€” one variable on the x-axis, the other on the y-axis โ€” with one dot for each item. The pattern of the dots reveals whether the two are linked.

Plotting a scatter graph

Suppose we record, for 8 students, the hours they revised and their test mark.

Hours revised (x)Test mark % (y)
135
250
345
460
565
670
778
885

Each row becomes one point, for example (1, 35) and (8, 85). You do not join the dots with a zig-zag line, the way you would on a line graph. You leave them as separate points and look at the overall shape they make.

Types of correlation

Correlation describes the trend in the cloud of points.

TypeWhat the points doReal example
Positive correlationSlope up: as x rises, y risesRevision hours and test marks
Negative correlationSlope down: as x rises, y fallsA car's age and its value
Zero correlationNo pattern; scattered randomlyShoe size and exam mark

Correlation also has strength. If the points lie almost on a straight line, the correlation is strong. If they are loosely scattered around a trend, it is weak. Our revision data slopes clearly upward with points close to a line, so it shows strong positive correlation.

The line of best fit

A line of best fit is a single straight line drawn through the middle of the trend, with roughly equal numbers of points above and below it. It does not have to pass through any actual point, and it should follow the general direction rather than chase individual dots.

Tips for drawing one well:

  • Aim for the line to pass close to as many points as possible.
  • Balance the points: about half above, half below.
  • It is often (but not always) sensible to make it pass near the mean point โ€” the point made from the mean of all the x-values and the mean of all the y-values.

Making predictions

Once you have a line of best fit, you can predict a missing value.

Worked example. Using the revision line, estimate the test mark for a student who revised 4.5 hours.

  1. Find 4.5 on the x-axis.
  2. Go straight up to the line of best fit.
  3. Read across to the y-axis โ€” you would read roughly 62%.

Predicting within the range of the data (here, 1 to 8 hours) is called interpolation and is fairly reliable. Predicting far outside the data, called extrapolation โ€” say, the mark after 40 hours โ€” is risky, because the trend may not continue.

Correlation is not causation

This is the most important warning in all of statistics. A strong correlation does not prove that one variable causes the other.

A classic example: across a summer, ice cream sales and drowning incidents rise together โ€” a strong positive correlation. Ice cream obviously does not cause drowning. A hidden third factor, hot weather, drives both: heat makes people buy ice cream and go swimming. Always ask whether a lurking variable could explain a correlation before claiming a cause.

Activity: collect and plot

  1. Collect paired data from classmates, such as arm span and height, or hours of sleep and a reaction-time test.
  2. Plot a scatter graph with sensible axes.
  3. Describe the correlation: positive, negative or zero, and strong or weak.
  4. Draw a line of best fit and use it to predict a value.
  5. Discuss whether the relationship looks like genuine cause and effect, or whether a third factor might be at work.

Why this matters

Scatter graphs are how scientists, economists and doctors test whether two things are connected, from diet and health to advertising and sales. The skill of reading a trend, predicting from it, and resisting the leap from correlation to causation is one of the most useful in all of data handling. Build the plotting skills further with the coordinate plane, and connect the line of best fit to straight line graphs and gradients.

Quick quiz

Test yourself and earn XP

On a scatter graph, points that rise from bottom-left to top-right show...

More hours of exercise linked to lower resting heart rate is an example of...

What is the purpose of a line of best fit?

Ice cream sales and drowning incidents both rise in summer. This shows that...

FAQ

Correlation describes how two variables are related. Positive correlation means they increase together, negative means one rises as the other falls, and zero means no clear link.

No. Correlation shows a relationship in the data, but it does not prove causation. A hidden third factor may be driving both variables.