Pretraining
The initial training phase where a model learns general patterns from large amounts of raw data before being fine-tuned for specific tasks.
Pretraining is the first, most expensive step in building a foundation model. The model learns from vast amounts of raw data — text, images, or multimodal content — typically using self-supervised objectives like next-token prediction.
After pretraining, the model has a broad understanding of language, facts, and patterns. It can then be fine-tuned for specific tasks or aligned for specific behaviors.
Pretraining scale is measured in tokens (for text) or samples. GPT-4 and similar models were pretrained on trillions of tokens. Scaling laws suggest more data and compute consistently improve model quality — though returns diminish.