VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
Ashishkumar Gudmalwar, Nirmesh Shah, Sai Akarsh, Pankaj Wasnik, Rajiv, Ratn Shah

TL;DR
VECL-TTS is an end-to-end cross-lingual TTS system that enables simultaneous control of voice identity and emotional style transfer across languages, improving speech synthesis quality with novel consistency losses.
Contribution
The paper introduces VECL-TTS, a novel system that jointly controls voice identity and emotional style in cross-lingual TTS using multilingual data and consistency losses.
Findings
Achieved 8.83% relative improvement over SOTA methods.
Effectively transfers voice and emotion across languages.
Enhances speech quality with content and style consistency losses.
Abstract
Despite the significant advancements in Text-to-Speech (TTS) systems, their full utilization in automatic dubbing remains limited. This task necessitates the extraction of voice identity and emotional style from a reference speech in a source language and subsequently transferring them to a target language using cross-lingual TTS techniques. While previous approaches have mainly concentrated on controlling voice identity within the cross-lingual TTS framework, there has been limited work on incorporating emotion and voice identity together. To this end, we introduce an end-to-end Voice Identity and Emotional Style Controllable Cross-Lingual (VECL) TTS system using multilingual speakers and an emotion embedding network. Moreover, we introduce content and style consistency losses to enhance the quality of synthesized speech further. The proposed system achieved an average relative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Phonetics and Phonology Research
