Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler

TL;DR
This paper presents a simple, cost-effective data augmentation method for speech recognition and translation that involves concatenating existing data to improve model performance across multiple datasets and languages.
Contribution
The study introduces a minimal effort data augmentation technique using concatenation, which enhances speech recognition and translation models without requiring speaker information.
Findings
Significant WER reductions on LibriSpeech-960h test sets.
Up to 0.9 WER improvement on CoVoST-2 for multiple languages.
Method effective across speech recognition and translation tasks.
Abstract
Data augmentation is a technique to generate new training data based on existing data. We evaluate the simple and cost-effective method of concatenating the original data examples to build new training instances. Continued training with such augmented data is able to improve off-the-shelf Transformer and Conformer models that were optimized on the original data only. We demonstrate considerable improvements on the LibriSpeech-960h test sets (WER 2.83 and 6.87 for test-clean and test-other), which carry over to models combined with shallow fusion (WER 2.55 and 6.27). Our method of continued training also leads to improvements of up to 0.9 WER on the ASR part of CoVoST-2 for four non English languages, and we observe that the gains are highly dependent on the size of the original training data. We compare different concatenation strategies and found that our method does not need speaker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Test · Linear Layer · Softmax · Adam · Label Smoothing · Position-Wise Feed-Forward Layer · Dense Connections · Absolute Position Encodings
