Improving generalization of vocal tract feature reconstruction: from   augmented acoustic inversion to articulatory feature reconstruction without   articulatory data

Rosanna Turrisi; Raffaele Tavarone; Leonardo Badino

arXiv:1809.00938·cs.CL·September 13, 2023

Improving generalization of vocal tract feature reconstruction: from augmented acoustic inversion to articulatory feature reconstruction without articulatory data

Rosanna Turrisi, Raffaele Tavarone, Leonardo Badino

PDF

Open Access

TL;DR

This paper explores methods to improve the reconstruction of vocal tract features from audio data, including a novel approach that generates articulatory features without using direct articulatory measurements, enhancing generalization across datasets.

Contribution

It introduces a new approach to generate articulatory features solely from acoustic data, bypassing the need for scarce articulatory measurements, and demonstrates improved generalization in vocal tract feature reconstruction.

Findings

01

Phonetic labels outperform acoustic features in articulatory reconstruction.

02

The novel approach correlates up to 0.59 with actual articulatory measurements.

03

Generated articulatory features enable reconstruction without direct articulatory data.

Abstract

We address the problem of reconstructing articulatory movements, given audio and/or phonetic labels. The scarce availability of multi-speaker articulatory data makes it difficult to learn a reconstruction that generalizes to new speakers and across datasets. We first consider the XRMB dataset where audio, articulatory measurements and phonetic transcriptions are available. We show that phonetic labels, used as input to deep recurrent neural networks that reconstruct articulatory features, are in general more helpful than acoustic features in both matched and mismatched training-testing conditions. In a second experiment, we test a novel approach that attempts to build articulatory features from prior articulatory information extracted from phonetic labels. Such approach recovers vocal tract movements directly from an acoustic-only dataset without using any articulatory measurement.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research