Does Data Augmentation Lead to Positive Margin?

Shashank Rajput; Zhili Feng; Zachary Charles; Po-Ling Loh; Dimitris; Papailiopoulos

arXiv:1905.03177·cs.LG·May 9, 2019·6 cites

Does Data Augmentation Lead to Positive Margin?

Shashank Rajput, Zhili Feng, Zachary Charles, Po-Ling Loh, Dimitris, Papailiopoulos

PDF

Open Access

TL;DR

This paper investigates how data augmentation influences model robustness by analyzing the margin it enforces, revealing that significant robustness gains may require exponentially many augmented data points.

Contribution

It provides the first theoretical analysis linking data augmentation to margin improvements, especially for linear and certain nonlinear models.

Findings

01

Lower bounds on augmented data points needed for positive margin

02

Common DA techniques may require exponentially many points for significant margin

03

Analysis applies to linear and specific nonlinear models

Abstract

Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness. DA artificially expands the training set by applying random noise, rotations, crops, or even adversarial perturbations to the input data. Although DA is widely used, its capacity to provably improve robustness is not fully understood. In this work, we analyze the robustness that DA begets by quantifying the margin that DA enforces on empirical risk minimizers. We first focus on linear separators, and then a class of nonlinear models whose labeling is constant within small convex hulls of data points. We present lower bounds on the number of augmented data points required for non-zero margin, and show that commonly used DA techniques may only introduce significant margin after adding exponentially many points to the data set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Machine Learning and Data Classification