Improving generalization of vocal tract feature reconstruction: from augmented acoustic inversion to articulatory feature reconstruction without articulatory data
Rosanna Turrisi, Raffaele Tavarone, Leonardo Badino

TL;DR
This paper explores methods to improve the reconstruction of vocal tract features from audio data, including a novel approach that generates articulatory features without using direct articulatory measurements, enhancing generalization across datasets.
Contribution
It introduces a new approach to generate articulatory features solely from acoustic data, bypassing the need for scarce articulatory measurements, and demonstrates improved generalization in vocal tract feature reconstruction.
Findings
Phonetic labels outperform acoustic features in articulatory reconstruction.
The novel approach correlates up to 0.59 with actual articulatory measurements.
Generated articulatory features enable reconstruction without direct articulatory data.
Abstract
We address the problem of reconstructing articulatory movements, given audio and/or phonetic labels. The scarce availability of multi-speaker articulatory data makes it difficult to learn a reconstruction that generalizes to new speakers and across datasets. We first consider the XRMB dataset where audio, articulatory measurements and phonetic transcriptions are available. We show that phonetic labels, used as input to deep recurrent neural networks that reconstruct articulatory features, are in general more helpful than acoustic features in both matched and mismatched training-testing conditions. In a second experiment, we test a novel approach that attempts to build articulatory features from prior articulatory information extracted from phonetic labels. Such approach recovers vocal tract movements directly from an acoustic-only dataset without using any articulatory measurement.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
