📚
AI🎓 Ages 14-18Advanced 12 min read

How Large Language Models Are Trained

A clear, de-hyped guide to how large language models are trained: tokens, pretraining on next-word prediction, fine-tuning, RLHF, plus the real costs and limits.

Key takeaways

  • A large language model is trained mainly to predict the next token (word piece) over and over
  • Pretraining uses huge amounts of text and adjusts billions of internal numbers called parameters
  • Fine-tuning and human feedback (RLHF) shape the raw model into a helpful, safer assistant
  • The model learns statistical patterns of language, not verified facts, so it can be fluent and wrong
  • Training costs enormous compute, energy and human labour, and the result still has real limits

What we are actually talking about

A large language model, or LLM, is the engine behind modern AI chatbots and writing tools. People describe these systems in dramatic ways, so it helps to cut through the hype and look at what really happens when one is trained. The short version: an LLM is trained to predict the next chunk of text, over and over, across an almost unimaginable amount of writing, until it becomes startlingly good at producing language. Everything else is built on top of that one idea.

If you have not yet seen how chatbots use these models to hold a conversation, How Chatbots Work is a good companion to this lesson. Here we go under the hood and trace how the model itself is built.

Step 1: Turning text into tokens

A model cannot read letters the way you do. First, all the training text is broken into tokens: small chunks that are often whole words but sometimes word fragments. The word "running" might become "run" and "ning". Punctuation and spaces become tokens too. A useful rule of thumb is that one token is roughly three-quarters of a word in English.

Each token is then turned into a list of numbers, because, like all AI, the model only works with data in numerical form. This numerical version of a token is called an embedding, and it positions each token in a kind of mathematical space where related words sit near each other.

Step 2: Pretraining, the main event

Now the real learning begins, and it is conceptually simple even though it is technically gigantic. The model is shown a stretch of text with the next token hidden, and it has to guess that next token. Then the real token is revealed, the model's guess is compared to the truth, and its internal numbers are nudged to make a better guess next time. This is the same learn-from-examples loop you see in supervised learning; the clever part is that the "label" is simply the next word already sitting in the text, so no human has to label anything.

Those internal numbers are called parameters, and a large model has billions of them. They are arranged in a structure called a transformer, whose key trick, called attention, lets the model weigh which earlier words matter most when predicting the next one. This is what lets it keep track of context across a long passage rather than just the last word.

Repeat this next-token game across trillions of tokens of books, articles, websites and code, and something remarkable emerges. To predict the next word well, the model is forced to pick up grammar, facts, writing styles, reasoning patterns, and the structure of many languages, because all of that helps it guess better. Nobody programmed those abilities in; they fell out of relentless next-word prediction. This stage is called pretraining, and it is by far the most expensive part.

Step 3: Fine-tuning the raw model

A freshly pretrained model is powerful but unruly. Ask it a question and it might continue with more questions, because in its training data, questions are often followed by more questions. It needs to be shaped into something that answers helpfully.

This is fine-tuning. The model is trained further on a smaller, carefully chosen set of examples that show the behaviour you want, such as a question followed by a clear, helpful answer. This teaches it the format of being an assistant rather than just a text predictor.

Step 4: Learning from human feedback (RLHF)

Fine-tuning on examples gets you partway. To make a model genuinely helpful, polite, and safer, engineers use reinforcement learning from human feedback, usually shortened to RLHF.

It works like this. The model produces several different answers to the same prompt. Human reviewers rank those answers from best to worst. Those rankings train a second model, a "reward model", to predict which answers people prefer. The main model is then tuned to produce more of the answers the reward model rates highly. Over many rounds, the model drifts toward responses humans find helpful and away from rude, dangerous, or useless ones.

RLHF is also where a lot of the model's values and refusals come from. The choices those human reviewers make, what counts as helpful or harmful, get baked into the model. That carries the same risks covered in Training Data and Bias in AI and connects directly to AI Ethics and Fairness: whose judgement shapes the model, and who gets left out?

The honest costs and limits

It is easy to be dazzled, so here is the de-hyped reality.

The costs are enormous. Training a frontier model can use thousands of specialised chips running for weeks, consume large amounts of electricity and water for cooling, and cost millions. There is also hidden human labour: people around the world are paid, often modestly, to label data and review outputs, sometimes including disturbing content. These are real ethical and environmental footprints, not abstractions.

The model predicts, it does not verify. Because its whole training rewards plausible text, an LLM can produce fluent, confident statements that are simply false. These are called hallucinations. The model is not lying, because it has no concept of truth to lie about; it is generating likely-sounding words. This is why you should treat its factual claims as a draft to check, never as a settled answer.

It is frozen after training. The model's parameters do not update while you chat. Its knowledge has a cutoff date, and it does not learn from your conversation in real time. Newer information must be fed in by other means.

It reflects its data. Patterns, gaps and biases in the training text show up in the output. A model trained mostly on one language or worldview will quietly favour it.

Why this matters to you

Understanding that an LLM is a next-token predictor, polished by fine-tuning and human feedback, changes how you use it. You stop expecting an oracle and start treating it as a fast, fluent, sometimes-wrong assistant that needs a human in charge. That mindset is the foundation of using these tools well. If you are curious how the same underlying idea powers image and text generation, see Generative AI: Images and Text, and if you want to build intuition by writing your own programs, explore coding.

Quick quiz

Test yourself and earn XP

What is the core task a language model is trained on during pretraining?

What is a 'token' in this context?

What are 'parameters' in a language model?

What does RLHF (reinforcement learning from human feedback) do?

Why can a language model sound confident yet state something false?

FAQ

Not in the human sense. It has learned extremely rich statistical patterns of how words relate, which lets it produce coherent, useful text. But it has no body, no lived experience, and no built-in fact-checker. Calling it 'understanding' is a useful shorthand, not a literal claim.

Usually not during the chat itself. The model's parameters are fixed after training. Some companies may later use collected conversations to train future versions, but the model you are talking to is not updating itself as you type.