Training & Learning
Dropout
A regularization technique that randomly deactivates a fraction of neurons during training to prevent co-adaptation and overfitting.
Dropout randomly sets a fraction of neuron outputs to zero during each training step. This prevents neurons from relying too heavily on each other and forces the network to learn redundant, robust features.
During inference, dropout is turned off and all neurons are active — typically with outputs scaled to account for the training-time deactivation. This acts like an implicit ensemble of many subnetworks.
Typical values: dropout rates of 0.1-0.5 for fully connected layers. Modern transformers often use smaller values (0.05-0.1) or skip dropout entirely.
Dropout was a major breakthrough in 2012, enabling deeper networks to train without overfitting. It remains one of the most effective and widely used regularization techniques.