🌳
AI🔬 Ages 11-13Intermediate 10 min read

Decision Trees Explained

A clear guide to decision trees in machine learning: how a model learns yes/no questions to sort data, why they are easy to read, and where they go wrong.

Key takeaways

  • A decision tree sorts data by asking a chain of yes/no questions
  • Each question splits the data to make the groups purer, closer to one answer
  • Trees are easy for humans to read and explain, unlike many other models
  • A tree that grows too deep can memorise the training data and fail on new data
  • Many small trees combined into a 'forest' usually beat one big tree

A model you can actually read

Many AI models are like a sealed box: data goes in, an answer comes out, and it is hard to see why. A decision tree is refreshingly different. It makes its decisions by asking a chain of simple yes/no questions, and you can read every one of them. That makes decision trees one of the easiest machine-learning models to understand, and a great place to see how a computer can learn a sorting rule from examples.

If you have not yet met the idea of a computer learning from examples, the lesson What Is Machine Learning? is a good warm-up. Here we zoom into one specific and very useful kind of model.

The big idea: a flowchart of questions

Imagine you want to decide whether to take an umbrella. You might think:

  1. Is it cloudy? If no, leave the umbrella. If yes, ask the next question.
  2. Is the forecast rain above 50%? If no, leave it. If yes, take the umbrella.

That is a decision tree. You start at the top (the "root"), answer a question, and follow the matching branch down to the next question, until you reach a final answer at the bottom (a "leaf"). Each leaf gives a decision: umbrella, or no umbrella.

A decision tree in AI works exactly like this, with one important twist: the computer figures out the questions by itself from training data. Nobody hand-writes "is it cloudy?" The machine discovers which questions are most useful.

How a tree learns its questions

Suppose you want a tree that predicts whether a fruit is an apple or an orange. You have a table of examples, each with the fruit's colour, size, and the correct label.

The tree learns by hunting for the best question to ask first. It tries many possible splits:

  • "Is the colour orange?"
  • "Is the width more than 7 cm?"
  • "Is the skin bumpy?"

For each option, it checks how well the question separates apples from oranges. A great question creates two groups that are as pure as possible, meaning one branch is almost all apples and the other is almost all oranges. The tree picks the question that gives the cleanest split.

Then it repeats the whole process on each branch. Maybe after splitting on colour, one group is still mixed, so the tree asks a second question, like "Is it bigger than a tennis ball?" to sort it further. It keeps splitting until each final group is pure enough, or until it runs out of useful questions. The result is a tree of questions, learned entirely from the examples.

Why people love decision trees

The greatest strength of a decision tree is that it is transparent. You can follow the exact path of questions that led to any decision and explain it in plain words: "We predicted apple because the fruit was red, larger than 6 cm, and had a smooth skin."

That matters a lot in the real world. If a model helps decide who gets a loan or what medical follow-up a patient needs, people deserve to know why. A tree can show its reasoning, while many other models cannot. This connects to the bigger conversation in AI Ethics and Fairness: being able to explain a decision is part of making AI fair and accountable.

Trees are also fast, work on both categories and numbers, and do not need the data to be specially scaled or prepared. That makes them a popular first choice for data that lives in tables.

Where decision trees go wrong

No model is perfect, and trees have a famous weakness: they can grow too deep.

If you let a tree keep adding questions, it will eventually carve the training data into tiny groups, one for almost every example. At that point it gets the training data perfectly right, but it has basically memorised the answers instead of learning a general rule. Shown a new fruit it has never seen, this overgrown tree often guesses badly. This problem, where a model fits its training data too tightly and fails on new data, is called overfitting, and it is one of the most important ideas in all of machine learning.

Trees can also be unstable. Change just a few training examples and the whole tree might reshape itself, asking quite different questions. A single tree can be a bit fragile.

The clever fix: a forest of trees

Engineers found a neat way around these weaknesses. Instead of trusting one big tree, you build many smaller, slightly different trees and let them vote. Each tree is trained on a random slice of the data and a random set of questions, so they all see the problem a little differently.

When a new example arrives, every tree gives its answer, and the majority vote wins. This team of trees is called a random forest. Because the trees make different mistakes, their errors tend to cancel out, and the forest is far more accurate and stable than any single tree. A related, even more powerful method called gradient boosting builds trees one after another, each one focused on fixing the mistakes of the last.

This "wisdom of the crowd" effect only works because the trees are different from each other. If every tree asked the same questions, they would all make the same mistakes, and voting would change nothing. By training each tree on a random slice of the data and letting it consider only a random handful of possible questions at each split, the forest forces variety. It is a bit like asking many people with different backgrounds to vote: their disagreements are exactly what make the group's overall answer more reliable than any one person's.

Single treeRandom forest
AccuracyDecentUsually much better
Easy to read?Very easyHarder (many trees)
StabilityCan be fragileRobust

So decision trees teach two big lessons at once. First, a computer can learn a clear, readable rule just by finding the best questions to ask. And second, a crowd of simple models, each imperfect, can together be smarter than one. Both ideas show up again and again across all of AI.

Quick quiz

Test yourself and earn XP

How does a decision tree reach an answer?

What is the tree trying to do at each split?

Why are decision trees popular for explaining decisions?

What goes wrong if a tree grows too deep?

What is a 'random forest'?

FAQ

Yes, very much. For data laid out in tables, like spreadsheets of numbers and categories, tree-based methods such as random forests and gradient boosting are often more accurate and far easier to explain than neural networks. Each tool suits different jobs.

It tests many possible questions and measures which one best separates the data into purer groups. It picks that question for the top split, then repeats the process for each branch below it.