Provable Benefit of Mixup for Finding Optimal Decision Boundaries
Junsoo Oh, Chulhee Yun

TL;DR
This paper demonstrates that Mixup data augmentation reduces the sample complexity in finding optimal decision boundaries in binary classification, especially for highly separable data, and analyzes its theoretical benefits and limitations.
Contribution
The paper provides a theoretical analysis of Mixup's benefit in reducing sample complexity and introduces new concentration results for pair-wise augmented data.
Findings
Mixup mitigates the curse of separability by reducing sample complexity.
Vanilla training's sample complexity increases exponentially with data separability.
Other masking-based Mixup techniques can distort training loss and lead to suboptimal classifiers.
Abstract
We investigate how pair-wise data augmentation techniques like Mixup affect the sample complexity of finding optimal decision boundaries in a binary linear classification problem. For a family of data distributions with a separability constant , we analyze how well the optimal classifier in terms of training loss aligns with the optimal one in test accuracy (i.e., Bayes optimal classifier). For vanilla training without augmentation, we uncover an interesting phenomenon named the curse of separability. As we increase to make the data distribution more separable, the sample complexity of vanilla training increases exponentially in ; perhaps surprisingly, the task of finding optimal decision boundaries becomes harder for more separable distributions. For Mixup training, we show that Mixup mitigates this problem by significantly reducing the sample complexity. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Imbalanced Data Classification Techniques
MethodsTest · Mixup
