Data Augmentation as Feature Manipulation

Ruoqi Shen; S\'ebastien Bubeck; Suriya Gunasekar

arXiv:2203.01572·cs.LG·September 22, 2022·1 cites

Data Augmentation as Feature Manipulation

Ruoqi Shen, S\'ebastien Bubeck, Suriya Gunasekar

PDF

Open Access

TL;DR

This paper investigates how data augmentation influences the learning dynamics of neural networks, revealing that it acts as a form of feature manipulation that emphasizes certain informative features, especially in non-linear models.

Contribution

It provides a detailed theoretical analysis of data augmentation's effect on feature importance in neural network training, supported by experimental evidence.

Findings

01

Data augmentation alters feature importance during learning.

02

The effect is more significant in non-linear models like neural networks.

03

Augmentation can be viewed as a form of feature manipulation.

Abstract

Data augmentation is a cornerstone of the machine learning pipeline, yet its theoretical underpinnings remain unclear. Is it merely a way to artificially augment the data set size? Or is it about encouraging the model to satisfy certain invariance? In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process. We find that data augmentation can alter the relative importance of various features, effectively making certain informative but hard to learn features more likely to be captured in the learning process. Importantly, we show that this effect is more pronounced for non-linear models, such as neural networks. Our main contribution is a detailed analysis of data augmentation on the learning dynamic for a two layer convolutional neural network in the recently proposed multi-view data model by Allen-Zhu and Li [2020]. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning