Applying Syntax$\unicode{x2013}$Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis
Kei Furukawa, Takeshi Kishiyama, and Satoshi Nakamura

TL;DR
This paper introduces a neural TTS model that incorporates phonological and syntactic constraints to better reproduce linguistic pitch phenomena like downstep and rhythmic boost, aligning speech synthesis with linguistic theories.
Contribution
The study proposes a novel neural TTS approach integrating the syntax–prosody mapping hypothesis and well-formedness constraints to improve phonological phenomena reproduction.
Findings
Successfully synthesized pitch patterns matching linguistic phenomena
Model generalizes to unseen phonological phenomena
Improves linguistic naturalness in speech synthesis
Abstract
End-to-end text-to-speech synthesis (TTS), which generates speech sounds directly from strings of texts or phonemes, has improved the quality of speech synthesis over the conventional TTS. However, most previous studies have been evaluated based on subjective naturalness and have not objectively examined whether they can reproduce pitch patterns of phonological phenomena such as downstep, rhythmic boost, and initial lowering that reflect syntactic structures in Japanese. These phenomena can be linguistically explained by phonological constraints and the syntaxprosody mapping hypothesis (SPMH), which assumes projections from syntactic structures to phonological hierarchy. Although some experiments in psycholinguistics have verified the validity of the SPMH, it is crucial to investigate whether it can be implemented in TTS. To synthesize linguistic phenomena involving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
