Altitude Training: Strong Bounds for Single-Layer Dropout

Stefan Wager; William Fithian; Sida Wang; and Percy Liang

arXiv:1407.3289·stat.ML·November 3, 2014·20 cites

Altitude Training: Strong Bounds for Single-Layer Dropout

Stefan Wager, William Fithian, Sida Wang, and Percy Liang

PDF

Open Access

TL;DR

This paper provides a theoretical explanation for why dropout improves generalization in single-layer models, showing it enhances bounds and preserves decision boundaries under a Poisson topic model.

Contribution

It introduces a theoretical framework explaining dropout's effectiveness in high-dimensional single-layer settings, focusing on generalization bounds and decision boundary preservation.

Findings

01

Dropout improves the exponent in generalization bounds.

02

Dropout preserves the Bayes decision boundary.

03

Dropout induces minimal bias in high dimensions.

Abstract

Dropout training, originally designed for deep neural networks, has been successful on high-dimensional single-layer natural language tasks. This paper proposes a theoretical explanation for this phenomenon: we show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization. Dropout achieves this gain much like a marathon runner who practices at altitude: once a classifier learns to perform reasonably well on training examples that have been artificially corrupted by dropout, it will do very well on the uncorrupted test set. We also show that, under similar conditions, dropout preserves the Bayes decision boundary and should therefore induce minimal bias in high dimensions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsDropout