Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup
Damien Teney, Jindong Wang, Ehsan Abbasnejad

TL;DR
Selective mixup improves neural network generalization under distribution shifts mainly through implicit resampling effects, which can be understood as a form of label shift correction, rather than solely due to data mixing.
Contribution
This paper reveals that the success of selective mixup is largely due to implicit resampling effects, establishing an equivalence with resampling methods and analyzing their combined benefits.
Findings
Selective mixup implicitly resamples data, aiding generalization.
Resampling methods are confirmed effective for distribution shifts.
Combining mixup and resampling yields improved results.
Abstract
Mixup is a highly successful technique to improve generalization of neural networks by augmenting the training data with combinations of random pairs. Selective mixup is a family of methods that apply mixup to specific pairs, e.g. only combining examples across classes or domains. These methods have claimed remarkable improvements on benchmarks with distribution shifts, but their mechanisms and limitations remain poorly understood. We examine an overlooked aspect of selective mixup that explains its success in a completely new light. We find that the non-random selection of pairs affects the training distribution and improve generalization by means completely unrelated to the mixing. For example in binary classification, mixup across classes implicitly resamples the data for a uniform class distribution - a classical solution to label shift. We show empirically that this implicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Domain Adaptation and Few-Shot Learning
MethodsMixup
