Residual Connection

A shortcut that adds a layer's input to its output, enabling much deeper networks by preserving gradient flow.

Residual connections, introduced in ResNet (2015), add a layer's input directly to its output before activation. This creates a shortcut that lets gradients flow backward through many layers without vanishing.

Before residual connections, networks beyond ~20 layers became impossible to train. ResNet used them to train 100+ layer networks, achieving state-of-the-art on ImageNet.

Core insight: instead of learning a full transformation, learn the residual difference from identity. This is much easier to optimize.

Residual connections are essential to every modern deep network. Transformers use them throughout their architecture — around attention and feedforward sublayers — enabling models with hundreds of layers.

Related Terms

← Back to Glossary