SapAugment: Learning A Sample Adaptive Policy for Data Augmentation
Ting-Yao Hu, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula,, Stefan Braun, Kyuyeon Hwang, Ozlem Kalinli, Oncel Tuzel

TL;DR
SapAugment introduces a novel adaptive data augmentation policy that adjusts augmentation strength based on individual sample difficulty, leading to significant improvements in speech recognition accuracy.
Contribution
The paper proposes SapAugment, a method that learns a sample-specific augmentation policy based on training loss, integrating multiple augmentation techniques without manual tuning.
Findings
Up to 21% relative WER reduction on LibriSpeech
Effective adaptation of augmentation strength based on sample difficulty
Combines multiple augmentation methods into a unified framework
Abstract
Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples. For example, to perturb data with noise, the noise is sampled from a Normal distribution with a fixed standard deviation, for all samples. We hypothesize that a hard sample with high training loss already provides strong training signal to update the model parameters and should be perturbed with mild or no augmentation. Perturbing a hard sample with a strong augmentation may also make it too hard to learn from. Furthermore, a sample with low training loss should be perturbed by a stronger augmentation to provide more robustness to a variety of conditions. To formalize these intuitions, we propose a novel method to learn a Sample-Adaptive Policy for Augmentation -- SapAugment. Our policy adapts the augmentation parameters based on the training loss of the data samples. In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
