Inference & Generation

Temperature

A parameter that controls the randomness of an AI model's outputs — lower values are more deterministic, higher values are more creative.

Temperature is a sampling parameter that controls how "random" or "creative" an LLM's output is. During generation, the model produces a probability distribution over all possible next tokens. Temperature scales this distribution before sampling. Low temperature (near 0) makes the model almost always pick the highest-probability token — deterministic and conservative. High temperature (1.5–2) flattens the distribution, making lower-probability tokens much more likely — creative but potentially incoherent.

Most practical applications use temperature between 0 and 1. For factual tasks like data extraction or code generation, use 0–0.3. For creative tasks like brainstorming or writing, use 0.7–1.0. Values above 1 are rarely useful in production.

Temperature = 0: Greedy decoding — always pick the most likely token. Fully deterministic (same prompt → same output every time). Best for structured tasks.
Temperature = 1: Sample from the raw probability distribution. Balanced creativity.
Temperature > 1: Amplifies randomness, often produces incoherent text.

Temperature vs Top-P

Temperature — scales all token probabilities up or down
Top-P (nucleus sampling) — only consider tokens in the top P% of probability mass
Usually set one or the other, not both simultaneously
Top-P is often more predictable than temperature at the same "creativity" level

When using an AI API, temperature is one of the most important parameters to tune. For a customer support bot, use temperature 0.1 for consistent, accurate responses. For a marketing copy generator, temperature 0.8 gives more variety. For code generation, temperature 0 or near 0 prevents the model from "improvising" incorrect syntax.

Related Terms

← Back to Glossary