On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Tsz Kin Lam; Mayumi Ohta; Shigehiko Schamoni; Stefan Riezler

arXiv:2104.01393·cs.CL·June 13, 2023

On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Tsz Kin Lam, Mayumi Ohta, Shigehiko Schamoni, Stefan Riezler

PDF

1 Repo

TL;DR

This paper introduces Aligned Data Augmentation (ADA), an on-the-fly method for speech recognition training that uses alignment info to generate diverse, semantically close training pairs, improving model robustness and accuracy.

Contribution

The paper presents a novel ADA method that leverages alignment information to generate effective augmented training data for sequence-to-sequence ASR models.

Findings

01

ADA improves WER by 9-23% over SpecAugment on LibriSpeech 100h.

02

ADA achieves 4-15% relative WER reduction on LibriSpeech 960h.

03

The method enhances robustness and diversity of training data.

Abstract

We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) that uses alignment information to generate effective training samples. Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate previously unseen training pairs. The speech representations are sampled from an audio dictionary that has been extracted from the training corpus and inject speaker variations into the training examples. The transcribed tokens are either predicted by a language model such that the augmented data pairs are semantically close to the original data, or randomly sampled. Both strategies result in training pairs that improve robustness in ASR training. Our experiments on a Seq-to-Seq architecture show that ADA can be applied on top of SpecAugment, and achieves about 9-23% and 4-15%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

StatNLP/ada4asr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAdaptive Discriminator Augmentation