Learning Rate

A hyperparameter that controls how much the model weights change with each update during training.

The learning rate determines the step size the optimizer takes when updating model weights based on the gradient of the loss. Too high and training becomes unstable; too low and training is painfully slow or gets stuck in poor local minima.

Modern training uses learning rate schedules — starting with a warmup phase, reaching a peak, then decaying over time. Cosine decay and linear decay are common choices.

Typical values: 1e-3 to 1e-5 for deep learning. LLM fine-tuning often uses 1e-5 to 5e-5.

The learning rate is one of the most important hyperparameters to tune. A poorly chosen LR can sink an otherwise well-designed model. Adaptive optimizers like Adam adjust per-parameter rates automatically.

Related Terms

← Back to Glossary