G-Augment: Searching for the Meta-Structure of Data Augmentation   Policies for ASR

Gary Wang; Ekin D.Cubuk; Andrew Rosenberg; Shuyang Cheng; Ron J.; Weiss; Bhuvana Ramabhadran; Pedro J. Moreno; Quoc V. Le; Daniel S. Park

arXiv:2210.10879·cs.LG·October 26, 2022

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

Gary Wang, Ekin D.Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J., Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park

PDF

Open Access

TL;DR

G-Augment introduces a graph-based search method to optimize data augmentation policies for ASR, outperforming traditional methods and setting new state-of-the-art results on CHiME-6.

Contribution

It proposes a novel graph-based search space for data augmentation policies and demonstrates its effectiveness in improving ASR performance.

Findings

01

G-Augment outperforms random search policies on CHiME-6 and AMI.

02

Achieves new state-of-the-art WER of 30.7% on CHiME-6.

03

Policies show better transferability across training conditions.

Abstract

Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more "end-to-end", the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present Graph-Augment, a technique to define the augmentation space as directed acyclic graphs (DAGs) and search over this space to optimize the augmentation policy itself. We show that given the same computational budget, policies produced by G-Augment are able to perform better than SpecAugment policies obtained by random search on fine-tuning tasks on CHiME-6 and AMI. G-Augment is also able to establish a new state-of-the-art ASR performance on the CHiME-6 evaluation set (30.7% WER). We further demonstrate that G-Augment policies show better transfer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques

MethodsRandom Search