Batch Normalization

A layer that normalizes activations across a batch to stabilize and speed up training of deep neural networks.

Batch normalization normalizes layer activations using the mean and variance of the current batch, followed by a learnable scale and shift. It stabilizes training, allows higher learning rates, and acts as a mild regularizer.

BatchNorm was a major breakthrough for training very deep networks. It helped ResNet and other deep architectures converge reliably where plain networks struggled.

Tradeoff: BatchNorm couples examples within a batch, which causes issues for small batches and some architectures.

Modern transformers typically use layer normalization instead of BatchNorm because it normalizes across features rather than the batch, making it more stable for sequence models.

Related Terms

← Back to Glossary