Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for   Practical Applications through Low-Effort Data Strategies

Srija Anand; Praveen Srinivasa Varadhan; Ashwin Sankar; Giri Raju,; Mitesh M. Khapra

arXiv:2407.13435·cs.CL·July 19, 2024

Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies

Srija Anand, Praveen Srinivasa Varadhan, Ashwin Sankar, Giri Raju,, Mitesh M. Khapra

PDF

Open Access 1 Repo

TL;DR

This paper addresses the challenge of out-of-vocabulary words in low-resource Indian TTS systems by proposing a low-cost data collection strategy that improves OOV performance without compromising quality.

Contribution

It introduces a practical, low-effort data augmentation method using volunteer recordings to enhance OOV handling in Hindi and Tamil TTS systems.

Findings

01

OOV benchmark reveals poor performance of current TTS systems.

02

Volunteer-recorded data improves OOV word intelligibility.

03

Inexpensive data collection maintains voice quality and in-domain performance.

Abstract

Publicly available TTS datasets for low-resource languages like Hindi and Tamil typically contain 10-20 hours of data, leading to poor vocabulary coverage. This limitation becomes evident in downstream applications where domain-specific vocabulary coupled with frequent code-mixing with English, results in many OOV words. To highlight this problem, we create a benchmark containing OOV words from several real-world applications. Indeed, state-of-the-art Hindi and Tamil TTS systems perform poorly on this OOV benchmark, as indicated by intelligibility tests. To improve the model's OOV performance, we propose a low-effort and economically viable strategy to obtain more training data. Specifically, we propose using volunteers as opposed to high quality voice artists to record words containing character bigrams unseen in the training data. We show that using such inexpensive data, the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AI4Bharat/IndicOOV
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Advanced Algorithms and Applications · Speech and Audio Processing