When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation
Ehsan Kamalloo, Mehdi Rezagholizadeh, Ali Ghodsi

TL;DR
This paper introduces Glitter, a universal data augmentation method that adaptively selects high-loss samples from a pre-generated pool to improve training efficiency and performance across various NLP tasks.
Contribution
Glitter is a novel, plug-in data augmentation technique that enhances sample efficiency and training speed by selecting worst-case samples without changing the training process.
Findings
Glitter is faster to train than baseline methods.
Glitter achieves competitive performance on NLP benchmarks.
Glitter is compatible with various training setups.
Abstract
Data Augmentation (DA) is known to improve the generalizability of deep neural networks. Most existing DA techniques naively add a certain number of augmented samples without considering the quality and the added computational cost of these samples. To tackle this problem, a common strategy, adopted by several state-of-the-art DA methods, is to adaptively generate or re-weight augmented samples with respect to the task objective during training. However, these adaptive DA methods: (1) are computationally expensive and not sample-efficient, and (2) are designed merely for a specific setting. In this work, we present a universal DA technique, called Glitter, to overcome both issues. Glitter can be plugged into any DA method, making training sample-efficient without sacrificing performance. From a pre-generated pool of augmented samples, Glitter adaptively selects a subset of worst-case…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Medical Imaging and Analysis
MethodsKnowledge Distillation
