Does Data Augmentation Lead to Positive Margin?
Shashank Rajput, Zhili Feng, Zachary Charles, Po-Ling Loh, Dimitris, Papailiopoulos

TL;DR
This paper investigates how data augmentation influences model robustness by analyzing the margin it enforces, revealing that significant robustness gains may require exponentially many augmented data points.
Contribution
It provides the first theoretical analysis linking data augmentation to margin improvements, especially for linear and certain nonlinear models.
Findings
Lower bounds on augmented data points needed for positive margin
Common DA techniques may require exponentially many points for significant margin
Analysis applies to linear and specific nonlinear models
Abstract
Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness. DA artificially expands the training set by applying random noise, rotations, crops, or even adversarial perturbations to the input data. Although DA is widely used, its capacity to provably improve robustness is not fully understood. In this work, we analyze the robustness that DA begets by quantifying the margin that DA enforces on empirical risk minimizers. We first focus on linear separators, and then a class of nonlinear models whose labeling is constant within small convex hulls of data points. We present lower bounds on the number of augmented data points required for non-zero margin, and show that commonly used DA techniques may only introduce significant margin after adding exponentially many points to the data set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Machine Learning and Data Classification
