On Using SpecAugment for End-to-End Speech Translation

Parnia Bahar; Albert Zeyer; Ralf Schl\"uter; Hermann Ney

arXiv:1911.08876·cs.CL·November 21, 2019·23 cites

On Using SpecAugment for End-to-End Speech Translation

Parnia Bahar, Albert Zeyer, Ralf Schl\"uter, Hermann Ney

PDF

Open Access

TL;DR

This paper explores SpecAugment, a simple data augmentation method applied directly to audio features, which improves end-to-end speech translation performance across different datasets and data scenarios.

Contribution

It demonstrates that SpecAugment effectively enhances speech translation accuracy and robustness, with consistent gains across multiple datasets and data conditions.

Findings

01

Up to +2.2% BLEU on LibriSpeech En->Fr

02

Up to +1.2% BLEU on IWSLT En->De

03

Effective in various data scenarios regardless of data size

Abstract

This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation. SpecAugment is a low-cost implementation method applied directly to the audio input features and it consists of masking blocks of frequency channels, and/or time steps. We apply SpecAugment on end-to-end speech translation tasks and achieve up to +2.2\% \BLEU on LibriSpeech Audiobooks En->Fr and +1.2% on IWSLT TED-talks En->De by alleviating overfitting to some extent. We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Music and Audio Processing · Speech Recognition and Synthesis