HomeGlossaryContext Window
Inference & Generation

Context Window

The maximum amount of text (measured in tokens) that an AI model can process in a single interaction.

The context window is the total amount of text — measured in tokens — that a language model can "see" and work with at one time. This includes the system prompt, the entire conversation history, any documents you paste in, and the model's output. When the context window fills up, older content is dropped.

Context window size has grown dramatically: GPT-3 had 4K tokens (~3,000 words), GPT-4 Turbo reached 128K tokens, and models like Gemini 1.5 Pro support up to 1 million tokens — enough for an entire codebase or 10 novels. Larger context windows enable more complex tasks but require more memory and computation.

Tokens vs words: 1 token ≈ 0.75 words in English. A 128K context window holds roughly 96,000 words — about the length of a long novel. Non-English text and code may use more tokens per character.

Why Context Window Matters

  • Determines how long a conversation can be before history is lost
  • Limits how much document content you can provide for analysis
  • Affects cost — more tokens in = higher API usage fees
  • Long-context models can do tasks impossible with short windows

For tasks exceeding the context window, retrieval-augmented generation (RAG) is the standard solution. Instead of loading all documents into context, a retrieval system finds the most relevant chunks and injects only those. This keeps costs manageable and performance high even for large knowledge bases.

Related Terms

← Back to Glossary