Dropout Training as Adaptive Regularization

Stefan Wager; Sida Wang; and Percy Liang

arXiv:1307.1493·stat.ML·November 4, 2013·274 cites

Dropout Training as Adaptive Regularization

Stefan Wager, Sida Wang, and Percy Liang

PDF

Open Access

TL;DR

This paper interprets dropout as an adaptive regularizer linked to Fisher information and AdaGrad, proposing a semi-supervised approach that enhances dropout's effectiveness in document classification.

Contribution

It introduces a novel perspective on dropout as an adaptive regularizer, connecting it to Fisher information and AdaGrad, and develops a semi-supervised method that improves performance.

Findings

01

Dropout acts as an adaptive regularizer related to Fisher information.

02

A semi-supervised algorithm leveraging unlabeled data improves dropout training.

03

The method achieves state-of-the-art results on IMDB reviews dataset.

Abstract

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Face and Expression Recognition · Machine Learning and Algorithms

MethodsAdaGrad