Data Augmentation via Levy Processes
Stefan Wager, William Fithian, and Percy Liang

TL;DR
This paper introduces a novel data augmentation framework based on Levy processes, generating pseudo-examples by slicing the process to improve classifier training while preserving decision boundaries.
Contribution
It presents a general Levy process-based approach for data augmentation that unifies existing methods and offers new generalizations, ensuring invariance and decision boundary preservation.
Findings
Framework preserves Bayes decision boundary.
Equivalent to fitting a generative model when rewinding to initial time.
Includes popular schemes like Gaussian noising and dropout as special cases.
Abstract
If a document is about travel, we may expect that short snippets of the document should also be about travel. We introduce a general framework for incorporating these types of invariances into a discriminative classifier. The framework imagines data as being drawn from a slice of a Levy process. If we slice the Levy process at an earlier point in time, we obtain additional pseudo-examples, which can be used to train the classifier. We show that this scheme has two desirable properties: it preserves the Bayes decision boundary, and it is equivalent to fitting a generative model in the limit where we rewind time back to 0. Our construction captures popular schemes such as Gaussian feature noising and dropout training, as well as admitting new generalizations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Neural Networks and Applications · Machine Learning and Algorithms
MethodsDropout
