Dropout Training as Adaptive Regularization
Stefan Wager, Sida Wang, and Percy Liang

TL;DR
This paper interprets dropout as an adaptive regularizer linked to Fisher information and AdaGrad, proposing a semi-supervised approach that enhances dropout's effectiveness in document classification.
Contribution
It introduces a novel perspective on dropout as an adaptive regularizer, connecting it to Fisher information and AdaGrad, and develops a semi-supervised method that improves performance.
Findings
Dropout acts as an adaptive regularizer related to Fisher information.
A semi-supervised algorithm leveraging unlabeled data improves dropout training.
The method achieves state-of-the-art results on IMDB reviews dataset.
Abstract
Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Face and Expression Recognition · Machine Learning and Algorithms
MethodsAdaGrad
