Prosody-controllable spontaneous TTS with neural HMMs
Harm Lameris, Shivam Mehta, Gustav Eje Henter, Joakim Gustafson, \'Eva, Sz\'ekely

TL;DR
This paper introduces a neural HMM-based TTS system with prosody control designed to synthesize spontaneous speech, including disfluencies and expressive phenomena, from small, irregular datasets while maintaining high quality.
Contribution
It presents a novel TTS architecture that combines prosody control with neural HMMs, enabling realistic spontaneous speech synthesis with limited data.
Findings
Prosody control accurately modulates speech intonation and rhythm.
Synthesized speech retains natural disfluencies and expressive features.
System effectively reproduces spontaneous speech phenomena like creaky voice.
Abstract
Spontaneous speech has many affective and pragmatic functions that are interesting and challenging to model in TTS. However, the presence of reduced articulation, fillers, repetitions, and other disfluencies in spontaneous speech make the text and acoustics less aligned than in read speech, which is problematic for attention-based TTS. We propose a TTS architecture that can rapidly learn to speak from small and irregular datasets, while also reproducing the diversity of expressive phenomena present in spontaneous speech. Specifically, we add utterance-level prosody control to an existing neural HMM-based TTS system which is capable of stable, monotonic alignments for spontaneous speech. We objectively evaluate control accuracy and perform perceptual tests that demonstrate that prosody control does not degrade synthesis quality. To exemplify the power of combining prosody control and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Fuzzy Logic and Control Systems · EEG and Brain-Computer Interfaces
