A Kernel Theory of Modern Data Augmentation

Tri Dao; Albert Gu; Alexander J. Ratner; Virginia Smith; Christopher; De Sa; Christopher R\'e

arXiv:1803.06084·cs.LG·March 21, 2019·67 cites

A Kernel Theory of Modern Data Augmentation

Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher, De Sa, Christopher R\'e

PDF

Open Access

TL;DR

This paper develops a theoretical framework for understanding data augmentation in machine learning, modeling it as a Markov process and analyzing its effects on kernel classifiers, leading to practical insights and applications.

Contribution

It introduces a novel kernel-based theoretical model of data augmentation, connecting it with invariant kernels, tangent propagation, and robust optimization, and demonstrates practical benefits.

Findings

01

Data augmentation can be modeled as a Markov process with natural kernel structures.

02

Augmentation effects on classifiers can be approximated by feature averaging and variance regularization.

03

The theory enables reducing training computation and predicting transformation utility.

Abstract

Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear naturally with respect to this model, even when we do not employ kernel classification. Next, we analyze more directly the effect of augmentation on kernel classifiers, showing that data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. These frameworks both serve to illustrate the ways in which data augmentation affects the downstream learning model, and the resulting analyses provide novel connections between prior work in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Gaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques