Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap

Guanrou Yang; Fan Yu; Ziyang Ma; Zhihao Du; Zhifu Gao; Shiliang Zhang,; Xie Chen

arXiv:2410.16726·eess.AS·October 23, 2024

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap

Guanrou Yang, Fan Yu, Ziyang Ma, Zhihao Du, Zhifu Gao, Shiliang Zhang,, Xie Chen

PDF

Open Access

TL;DR

This paper demonstrates that using versatile TTS models to generate synthetic speech data significantly improves low-resource ASR performance, especially by analyzing factors like text and speaker diversity.

Contribution

It introduces a novel approach of leveraging powerful TTS models for data augmentation in low-resource ASR, including the first study of text diversity's impact.

Findings

01

Consistent performance improvements across various low-resource datasets.

02

Text diversity in synthesized data significantly enhances ASR accuracy.

03

Analyzing factors like speaker diversity and data volume informs effective augmentation strategies.

Abstract

While automatic speech recognition (ASR) systems have achieved remarkable performance with large-scale datasets, their efficacy remains inadequate in low-resource settings, encompassing dialects, accents, minority languages, and long-tail hotwords, domains with significant practical relevance. With the advent of versatile and powerful text-to-speech (TTS) models, capable of generating speech with human-level naturalness, expressiveness, and diverse speaker profiles, leveraging TTS for ASR data augmentation provides a cost-effective and practical approach to enhancing ASR performance. Comprehensive experiments on an unprecedentedly rich variety of low-resource datasets demonstrate consistent and substantial performance improvements, proving that the proposed method of enhancing low-resource ASR through a versatile TTS model is highly effective and has broad application prospects.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Machine Learning and ELM · Network Packet Processing and Optimization