Exploring speech style spaces with language models: Emotional TTS without emotion labels
Shreeram Suresh Chandra, Zongyang Du, Berrak Sisman

TL;DR
This paper introduces TEMOTTS, a novel emotion-free speech synthesis framework that learns emotional styles implicitly through text awareness, improving emotional expressiveness without relying on explicit emotion labels.
Contribution
The study presents a new two-stage E-TTS framework that transfers knowledge from linguistic to emotional style spaces without using emotion labels or prompts.
Findings
Improves emotional accuracy in speech synthesis.
Enhances naturalness of generated speech.
First to leverage spoken content-emotion correlation for E-TTS.
Abstract
Many frameworks for emotional text-to-speech (E-TTS) rely on human-annotated emotion labels that are often inaccurate and difficult to obtain. Learning emotional prosody implicitly presents a tough challenge due to the subjective nature of emotions. In this study, we propose a novel approach that leverages text awareness to acquire emotional styles without the need for explicit emotion labels or text prompts. We present TEMOTTS, a two-stage framework for E-TTS that is trained without emotion labels and is capable of inference without auxiliary inputs. Our proposed method performs knowledge transfer between the linguistic space learned by BERT and the emotional style space constructed by global style tokens. Our experimental results demonstrate the effectiveness of our proposed framework, showcasing improvements in emotional accuracy and naturalness. This is one of the first studies to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dense Connections · Attention Dropout · Linear Layer · Weight Decay · Residual Connection · Adam · Dropout · Softmax
