Altitude Training: Strong Bounds for Single-Layer Dropout
Stefan Wager, William Fithian, Sida Wang, and Percy Liang

TL;DR
This paper provides a theoretical explanation for why dropout improves generalization in single-layer models, showing it enhances bounds and preserves decision boundaries under a Poisson topic model.
Contribution
It introduces a theoretical framework explaining dropout's effectiveness in high-dimensional single-layer settings, focusing on generalization bounds and decision boundary preservation.
Findings
Dropout improves the exponent in generalization bounds.
Dropout preserves the Bayes decision boundary.
Dropout induces minimal bias in high dimensions.
Abstract
Dropout training, originally designed for deep neural networks, has been successful on high-dimensional single-layer natural language tasks. This paper proposes a theoretical explanation for this phenomenon: we show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization. Dropout achieves this gain much like a marathon runner who practices at altitude: once a classifier learns to perform reasonably well on training examples that have been artificially corrupted by dropout, it will do very well on the uncorrupted test set. We also show that, under similar conditions, dropout preserves the Bayes decision boundary and should therefore induce minimal bias in high dimensions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
MethodsDropout
