Probing Human Articulatory Constraints in End-to-End TTS with Reverse and Mismatched Speech-Text Directions
Parth Khadse, Sunil Kumar Kopparapu

TL;DR
This study investigates how human articulatory constraints influence end-to-end TTS systems by experimenting with different text-speech direction combinations, revealing that reverse speech training can improve speech quality.
Contribution
It introduces reverse and mismatched speech-text training directions in end-to-end TTS, showing that these approaches can enhance speech naturalness and intelligibility beyond conventional methods.
Findings
Reverse speech training improves speech fidelity.
Mismatched text-speech directions can enhance naturalness.
End-to-end TTS is highly data-driven.
Abstract
An end-to-end (e2e) text-to-speech (TTS) system is a deep architecture that learns to associate a text string with acoustic speech patterns from a curated dataset. It is expected that all aspects associated with speech production, such as phone duration, speaker characteristics, and intonation among other things are captured in the trained TTS model to enable the synthesized speech to be natural and intelligible. Human speech is complex, involving smooth transitions between articulatory configurations (ACs). Due to anatomical constraints, some ACs are challenging to mimic or transition between. In this paper, we experimentally study if the constraints imposed by human anatomy have an implication on training an e2e-TTS systems. We experiment with two e2e-TTS architectures, namely, Tacotron-2 an autoregressive model and VITS-TTS a non-autoregressive model. In this study, we build TTS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Phonetics and Phonology Research
