Spectral Study of the Vocal Tract in Vowel Synthesis: A Comparison between 1D and 3D Acoustic Analysis
Negar M. Harandi, Daniel Aalto, Antti Hannukainen, Jarmo Malinen,, Sidney Fels

TL;DR
This paper compares 1D and 3D acoustic analysis methods for vowel synthesis, highlighting differences in formant frequency predictions and assessing the limitations of simplified models versus detailed 3D simulations.
Contribution
It introduces a comparison between 1D and 3D acoustic analysis techniques for vocal tract modeling using Helmholtz resonances and formant frequencies.
Findings
3D FEM resonances differ from 1D formant estimates
Discrepancies highlight limitations of simplified models
Comparison informs improvements in speech synthesis
Abstract
A state-of-the-art 1D acoustic synthesizer has been previously developed, and coupled to speaker-specific biomechanical models of oropharynx in ArtiSynth. As expected, the formant frequencies of the synthesized vowel sounds were shown to be different from those of the recorded audio. Such discrepancy was hypothesized to be due to the simplified geometry of the vocal tract model as well as the one dimensional implementation of Navier-Stokes equations. In this paper, we calculate Helmholtz resonances of our vocal tract geometries using 3D finite element method (FEM), and compare them with the formant frequencies obtained from the 1D method and audio. We hope such comparison helps with clarifying the limitations of our current models and/or speech synthesizer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonetics and Phonology Research
