The Impact of Prosodic Segmentation on Speech Synthesis of Spontaneous Speech
Julio Cesar Galdino, Sidney Evaldo Leal, Leticia Gabriella De Souza, Rodrigo de Freitas Lima, Antonio Nelson Fornari Mendes Moreira, Arnaldo Candido Junior, Miguel Oliveira Jr., Edresson Casanova, Sandra M. Alu\'isio

TL;DR
This study examines how explicit prosodic segmentation annotations, both manual and automatic, influence the naturalness and intelligibility of speech synthesized from spontaneous Brazilian Portuguese, highlighting the benefits of manual segmentation.
Contribution
It investigates the impact of explicit prosodic segmentation on spontaneous speech synthesis quality, comparing manual and automatic annotations using a non-autoregressive model.
Findings
Training with prosodic segmentation improves speech intelligibility and naturalness.
Manual segmentation introduces more variability, enhancing prosody.
Both approaches reproduce expected nuclear accent patterns, with manual aligning more closely to natural contours.
Abstract
Spontaneous speech presents several challenges for speech synthesis, particularly in capturing the natural flow of conversation, including turn-taking, pauses, and disfluencies. Although speech synthesis systems have made significant progress in generating natural and intelligible speech, primarily through architectures that implicitly model prosodic features such as pitch, intensity, and duration, the construction of datasets with explicit prosodic segmentation and their impact on spontaneous speech synthesis remains largely unexplored. This paper evaluates the effects of manual and automatic prosodic segmentation annotations in Brazilian Portuguese on the quality of speech synthesized by a non-autoregressive model, FastSpeech 2. Experimental results show that training with prosodic segmentation produced slightly more intelligible and acoustically natural speech. While automatic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Voice and Speech Disorders
