Enhancing Acoustic-to-Articulatory Speech Inversion by Incorporating Nasality
Saba Tabatabaee, Suzanne Boyce, Liran Oren, Mark Tiede, Carol Espy-Wilson

TL;DR
This paper improves acoustic-to-articulatory speech inversion by integrating nasalance measures, leading to better estimation of velum movement and speech features, which enhances understanding of speech production mechanisms.
Contribution
It introduces a synergistic model that combines oral tract variables and nasalance, outperforming baseline models in speech inversion accuracy.
Findings
Synergistic model improves oral TV estimation by 5%.
Synergistic model improves nasalance estimation by 9%.
Nasalance reliably recovers velum movement patterns.
Abstract
Speech is produced through the coordination of vocal tract constricting organs: lips, tongue, velum, and glottis. Previous works developed Speech Inversion (SI) systems to recover acoustic-to-articulatory mappings for lip and tongue constrictions, called oral tract variables (TVs), which were later enhanced by including source information (periodic and aperiodic energies, and F0 frequency) as proxies for glottal control. Comparison of the nasometric measures with high-speed nasopharyngoscopy showed that nasalance can serve as ground truth, and that an SI system trained with it reliably recovers velum movement patterns for American English speakers. Here, two SI training approaches are compared: baseline models that estimate oral TVs and nasalance independently, and a synergistic model that combines oral TVs and source features with nasalance. The synergistic model shows relative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)
