Models & ArchitectureSkip Connection
Residual Connection
A shortcut that adds a layer's input to its output, enabling much deeper networks by preserving gradient flow.
Residual connections, introduced in ResNet (2015), add a layer's input directly to its output before activation. This creates a shortcut that lets gradients flow backward through many layers without vanishing.
Before residual connections, networks beyond ~20 layers became impossible to train. ResNet used them to train 100+ layer networks, achieving state-of-the-art on ImageNet.
Core insight: instead of learning a full transformation, learn the residual difference from identity. This is much easier to optimize.
Residual connections are essential to every modern deep network. Transformers use them throughout their architecture — around attention and feedforward sublayers — enabling models with hundreds of layers.