Training & Learning

Pretraining

The initial training phase where a model learns general patterns from large amounts of raw data before being fine-tuned for specific tasks.

Pretraining is the first, most expensive step in building a foundation model. The model learns from vast amounts of raw data — text, images, or multimodal content — typically using self-supervised objectives like next-token prediction.

After pretraining, the model has a broad understanding of language, facts, and patterns. It can then be fine-tuned for specific tasks or aligned for specific behaviors.

Cost: pretraining a frontier LLM can cost tens to hundreds of millions of dollars in compute alone. Only a few organizations can afford to do this from scratch.

Pretraining scale is measured in tokens (for text) or samples. GPT-4 and similar models were pretrained on trillions of tokens. Scaling laws suggest more data and compute consistently improve model quality — though returns diminish.

Related Terms

← Back to Glossary