SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Ting-Yao Hu; Ashish Shrivastava; Jen-Hao Rick Chang; Hema Koppula,; Stefan Braun; Kyuyeon Hwang; Ozlem Kalinli; Oncel Tuzel

arXiv:2011.01156·cs.LG·February 16, 2021·1 cites

SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Ting-Yao Hu, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula,, Stefan Braun, Kyuyeon Hwang, Ozlem Kalinli, Oncel Tuzel

PDF

Open Access

TL;DR

SapAugment introduces a novel adaptive data augmentation policy that adjusts augmentation strength based on individual sample difficulty, leading to significant improvements in speech recognition accuracy.

Contribution

The paper proposes SapAugment, a method that learns a sample-specific augmentation policy based on training loss, integrating multiple augmentation techniques without manual tuning.

Findings

01

Up to 21% relative WER reduction on LibriSpeech

02

Effective adaptation of augmentation strength based on sample difficulty

03

Combines multiple augmentation methods into a unified framework

Abstract

Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples. For example, to perturb data with noise, the noise is sampled from a Normal distribution with a fixed standard deviation, for all samples. We hypothesize that a hard sample with high training loss already provides strong training signal to update the model parameters and should be perturbed with mild or no augmentation. Perturbing a hard sample with a strong augmentation may also make it too hard to learn from. Furthermore, a sample with low training loss should be perturbed by a stronger augmentation to provide more robustness to a variety of conditions. To formalize these intuitions, we propose a novel method to learn a Sample-Adaptive Policy for Augmentation -- SapAugment. Our policy adapts the augmentation parameters based on the training loss of the data samples. In the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing