Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Guanrou Yang, Fan Yu, Ziyang Ma, Zhihao Du, Zhifu Gao, Shiliang Zhang,, Xie Chen

TL;DR
This paper demonstrates that using versatile TTS models to generate synthetic speech data significantly improves low-resource ASR performance, especially by analyzing factors like text and speaker diversity.
Contribution
It introduces a novel approach of leveraging powerful TTS models for data augmentation in low-resource ASR, including the first study of text diversity's impact.
Findings
Consistent performance improvements across various low-resource datasets.
Text diversity in synthesized data significantly enhances ASR accuracy.
Analyzing factors like speaker diversity and data volume informs effective augmentation strategies.
Abstract
While automatic speech recognition (ASR) systems have achieved remarkable performance with large-scale datasets, their efficacy remains inadequate in low-resource settings, encompassing dialects, accents, minority languages, and long-tail hotwords, domains with significant practical relevance. With the advent of versatile and powerful text-to-speech (TTS) models, capable of generating speech with human-level naturalness, expressiveness, and diverse speaker profiles, leveraging TTS for ASR data augmentation provides a cost-effective and practical approach to enhancing ASR performance. Comprehensive experiments on an unprecedentedly rich variety of low-resource datasets demonstrate consistent and substantial performance improvements, proving that the proposed method of enhancing low-resource ASR through a versatile TTS model is highly effective and has broad application prospects.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Machine Learning and ELM · Network Packet Processing and Optimization
