ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams
Srija Anand, Praveen Srinivasa Varadhan, Mehak Singal, Mitesh M., Khapra

TL;DR
This paper introduces methods to improve low-resource TTS systems by utilizing related languages, denoising ASR data, and knowledge distillation, significantly enhancing speech intelligibility for languages with limited high-quality data.
Contribution
The paper presents a novel multi-faceted approach combining cross-lingual data, data enhancement, and model distillation to improve low-resource TTS performance.
Findings
Significant reduction in intelligibility issues for Hindi TTS.
Effective use of related language data improves synthesis quality.
Denoising ASR data enhances training effectiveness.
Abstract
Recent advancements in Text-to-Speech (TTS) technology have led to natural-sounding speech for English, primarily due to the availability of large-scale, high-quality web data. However, many other languages lack access to such resources, relying instead on limited studio-quality data. This scarcity results in synthesized speech that often suffers from intelligibility issues, particularly with low-frequency character bigrams. In this paper, we propose three solutions to address this challenge. First, we leverage high-quality data from linguistically or geographically related languages to improve TTS for the target language. Second, we utilize low-quality Automatic Speech Recognition (ASR) data recorded in non-studio environments, which is refined using denoising and speech enhancement models. Third, we apply knowledge distillation from large-scale models using synthetic data to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Text and Document Classification Technologies
MethodsKnowledge Distillation
