Global Mixup: Eliminating Ambiguity with Clustering
Xiangjin Xie, Yangning Li, Wang Chen, Kai Ouyang, Li Jiang, and Haitao Zheng

TL;DR
Global Mixup introduces a two-stage data augmentation method that uses clustering to generate more reliable virtual samples, improving model performance across various neural network architectures and tasks.
Contribution
The paper proposes a novel two-stage augmentation approach that decouples sample generation from labeling using clustering, expanding sampling space and reducing label ambiguity.
Findings
Significantly outperforms state-of-the-art baselines on multiple tasks.
Effective in low-resource scenarios.
Applicable to CNN, LSTM, and BERT models.
Abstract
Data augmentation with \textbf{Mixup} has been proven an effective method to regularize the current deep neural networks. Mixup generates virtual samples and corresponding labels at once through linear interpolation. However, this one-stage generation paradigm and the use of linear interpolation have the following two defects: (1) The label of the generated sample is directly combined from the labels of the original sample pairs without reasonable judgment, which makes the labels likely to be ambiguous. (2) linear combination significantly limits the sampling space for generating samples. To tackle these problems, we propose a novel and effective augmentation method based on global clustering relationships named \textbf{Global Mixup}. Specifically, we transform the previous one-stage augmentation process into two-stage, decoupling the process of generating virtual samples from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
MethodsMixup
