A Kernel Theory of Modern Data Augmentation
Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher, De Sa, Christopher R\'e

TL;DR
This paper develops a theoretical framework for understanding data augmentation in machine learning, modeling it as a Markov process and analyzing its effects on kernel classifiers, leading to practical insights and applications.
Contribution
It introduces a novel kernel-based theoretical model of data augmentation, connecting it with invariant kernels, tangent propagation, and robust optimization, and demonstrates practical benefits.
Findings
Data augmentation can be modeled as a Markov process with natural kernel structures.
Augmentation effects on classifiers can be approximated by feature averaging and variance regularization.
The theory enables reducing training computation and predicting transformation utility.
Abstract
Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear naturally with respect to this model, even when we do not employ kernel classification. Next, we analyze more directly the effect of augmentation on kernel classifiers, showing that data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. These frameworks both serve to illustrate the ways in which data augmentation affects the downstream learning model, and the resulting analyses provide novel connections between prior work in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Gaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques
