The Implicit and Explicit Regularization Effects of Dropout
Colin Wei, Sham Kakade, Tengyu Ma

TL;DR
This paper analyzes how dropout regularizes neural networks through explicit modifications to the training objective and implicit stochastic effects, providing analytic characterizations that replicate dropout's benefits.
Contribution
It disentangles the explicit and implicit regularization effects of dropout and derives analytic simplifications that accurately model these effects.
Findings
Explicit and implicit effects of dropout are distinguishable and quantifiable.
Analytic regularizers derived from the effects can replace dropout effectively.
The implicit effect is similar to stochasticity in mini-batch SGD.
Abstract
Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures. This work demonstrates that dropout introduces two distinct but entangled regularization effects: an explicit effect (also studied in prior work) which occurs since dropout modifies the expected training objective, and, perhaps surprisingly, an additional implicit effect from the stochasticity in the dropout training update. This implicit regularization effect is analogous to the effect of stochasticity in small mini-batch stochastic gradient descent. We disentangle these two effects through controlled experiments. We then derive analytic simplifications which characterize each effect in terms of the derivatives of the model and the loss, for deep neural networks. We demonstrate these simplified, analytic regularizers accurately capture the important aspects of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsDropout
