Exploring an Inter-Pausal Unit (IPU) based Approach for Indic End-to-End TTS Systems
Anusha Prakash, Hema A Murthy

TL;DR
This paper introduces an inter-pausal unit (IPU) based approach for Indian language end-to-end TTS, improving prosody and reducing errors in conversational speech synthesis using Tacotron2 and FastSpeech2 architectures.
Contribution
It proposes an IPU-based method for Indian language TTS that enhances prosody and reduces errors, applicable to both autoregressive and non-autoregressive models.
Findings
IPU-based Tacotron2 reduces insertion and deletion errors.
The approach requires less computational resources.
Produces prosodically richer speech.
Abstract
Sentences in Indian languages are generally longer than those in English. Indian languages are also considered to be phrase-based, wherein semantically complete phrases are concatenated to make up sentences. Long utterances lead to poor training of text-to-speech models and result in poor prosody during synthesis. In this work, we explore an inter-pausal unit (IPU) based approach in the end-to-end (E2E) framework, focusing on synthesising conversational-style text. We consider both autoregressive Tacotron2 and non-autoregressive FastSpeech2 architectures in our study and perform experiments with three Indian languages, namely, Hindi, Tamil and Telugu. With the IPU-based Tacotron2 approach, we see a reduction in insertion and deletion errors in the synthesised audio, providing an alternative approach to the FastSpeech(2) network in terms of error reduction. The IPU-based approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Time Synchronization Technologies · Fault Detection and Control Systems · Engineering and Test Systems
