ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and   Low-frequency Character Bigrams

Srija Anand; Praveen Srinivasa Varadhan; Mehak Singal; Mitesh M.; Khapra

arXiv:2410.17901·cs.CL·October 24, 2024

ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams

Srija Anand, Praveen Srinivasa Varadhan, Mehak Singal, Mitesh M., Khapra

PDF

Open Access

TL;DR

This paper introduces methods to improve low-resource TTS systems by utilizing related languages, denoising ASR data, and knowledge distillation, significantly enhancing speech intelligibility for languages with limited high-quality data.

Contribution

The paper presents a novel multi-faceted approach combining cross-lingual data, data enhancement, and model distillation to improve low-resource TTS performance.

Findings

01

Significant reduction in intelligibility issues for Hindi TTS.

02

Effective use of related language data improves synthesis quality.

03

Denoising ASR data enhances training effectiveness.

Abstract

Recent advancements in Text-to-Speech (TTS) technology have led to natural-sounding speech for English, primarily due to the availability of large-scale, high-quality web data. However, many other languages lack access to such resources, relying instead on limited studio-quality data. This scarcity results in synthesized speech that often suffers from intelligibility issues, particularly with low-frequency character bigrams. In this paper, we propose three solutions to address this challenge. First, we leverage high-quality data from linguistically or geographically related languages to improve TTS for the target language. Second, we utilize low-quality Automatic Speech Recognition (ASR) data recorded in non-studio environments, which is refined using denoising and speech enhancement models. Third, we apply knowledge distillation from large-scale models using synthetic data to generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Text and Document Classification Technologies

MethodsKnowledge Distillation