Towards Selection of Text-to-speech Data to Augment ASR Training
Shuo Liu, Leda Sar{\i}, Chunyang Wu, Gil Keren, Yuan Shangguan, Jay, Mahadeokar, Ozlem Kalinli

TL;DR
This paper introduces a neural network-based method to select synthetic TTS samples that enhance ASR training, reducing data size while maintaining accuracy, thus improving efficiency in speech recognition systems.
Contribution
We propose a novel data selection approach using a neural similarity measure to optimize TTS data inclusion for ASR training, outperforming baseline methods.
Findings
Synthetic samples with lexical dissimilarity improve ASR performance.
Our method reduces TTS data requirements below 30% of original size.
Maintains speech recognition accuracy comparable to using all TTS data.
Abstract
This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic speech recognition (ASR) model. We trained a neural network, which can be optimised using cross-entropy loss or Arcface loss, to measure the similarity of a synthetic data to real speech. We found that incorporating synthetic samples with considerable dissimilarity to real speech, owing in part to lexical differences, into ASR training is crucial for boosting recognition performance. Experimental results on Librispeech test sets indicate that, in order to maintain the same speech recognition accuracy as when using all TTS data, our proposed solution can reduce the size of the TTS data down below its , which is superior to several baseline methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsTest · Additive Angular Margin Loss
