🧩
AIπŸ”¬ Ages 11-13Beginner 9 min read

What Are Tokens in AI?

Understand tokens: the chunks of text an AI reads and writes. Learn how words split into tokens, why context windows and pricing use them, and their real limits.

Key takeaways

  • A token is a chunk of text, often a word or part of a word, not always a whole word
  • AI models read and write in tokens, then turn them into numbers to process
  • The context window is the maximum number of tokens a model can handle at once
  • Token count drives both speed and cost when using AI services
  • Tokenization is fixed and dumb: it splits text by patterns, not by meaning

The AI does not see letters or words

When you read this sentence, you see words. An AI language model does not work that way. Before it can process your text, the text is chopped into pieces called tokens. A token is a small chunk β€” often a whole word, sometimes just part of a word, sometimes punctuation. This chopping step is called tokenization.

If you have read How Large Language Models Are Trained, you have met the idea that AI predicts the next piece of text. That "piece" is a token.

What a token actually looks like

Take the sentence: The cat sat.

A tokenizer might split it like this:

Text chunkIs it a token?
Theyes
catyes (the space is included)
satyes
.yes

So that short sentence is about 4 tokens. Notice that spaces are usually glued to the start of a word β€” that is just how many tokenizers work.

Now take an unusual word like unbelievable. The tokenizer might break it into un, believ, able. The model has seen those pieces before, so it can handle a long word it rarely encounters by building it from familiar parts. This is the clever part: a small set of tokens can spell almost any word.

From tokens to numbers

The model cannot do math on the word cat. So each token is mapped to a number (an ID), and that number points to a list of values the model has learned. Everything after that β€” the prediction, the "thinking" β€” happens with numbers. When the model replies, it predicts tokens one at a time, then those tokens are turned back into readable text for you.

So the real loop is: your text β†’ tokens β†’ numbers β†’ prediction β†’ tokens β†’ text.

The context window: a hard memory limit

A model can only handle so many tokens at once. That maximum is its context window. Everything in the conversation β€” your prompt, any files you paste, and the model's own reply β€” must fit inside it.

This has real consequences:

  • In a long chat, the earliest messages can slide out of the window. The model is not forgetting like a person; those tokens simply no longer fit, so it acts as if they were never said.
  • A huge document may be too long to paste in one go and must be split.
  • "Bigger context window" is a real selling point for models, because it means they can read more at once.

Why you hear about tokens when paying for AI

If you use an AI service through its tools, you will often see prices listed per token, split into input tokens (what you send) and output tokens (what it writes). More tokens mean more computing work, so:

  • A long prompt costs more than a short one.
  • A long answer costs more than a short one.
  • Trimming filler words genuinely lowers token count.

Token count also affects speed β€” the model writes roughly one token at a time, so a 500-token answer takes longer to appear than a 50-token one.

The honest limits of tokenization

Tokenization is fixed and, frankly, a bit dumb. It splits text by frequency patterns, not by meaning. This causes quirks worth knowing:

  • The model can struggle to count letters in a word, because it sees tokens, not letters. Ask it how many rs are in "strawberry" and it may slip.
  • Spelling and rhyming tasks can trip it up for the same reason.
  • The same word can split differently depending on spacing or capitalisation.
  • Other languages, code, and emoji can use far more tokens than plain English, which affects cost and how much fits in the window.

None of this means the model is broken β€” it just means tokens are a mechanical step, not real reading. Knowing about tokens helps you understand why an AI sometimes nails an essay yet fumbles counting the letters in a single word.

Quick quiz

Test yourself and earn XP

What is a token?

Why split words into smaller tokens?

What is a context window?

What happens when a chat gets longer than the context window?

Does the AI understand tokens the way you understand words?

FAQ

As a rough guide, one common English word is about 1.3 tokens, so 100 words is roughly 130 tokens. Short common words are often a single token, while long or unusual words split into several. The exact split depends on the model's tokenizer, so treat any rule of thumb as approximate.

Many AI services charge by the token, counting both what you send (input) and what the model writes back (output). More tokens mean more computing work, so longer prompts and longer answers cost more and take longer. Trimming needless words can genuinely save tokens.