Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data
Sofiane Azzouz, Pierre-Andr\'e Vuissoz, Yves Laprie

TL;DR
This paper presents a novel deep learning approach using real-time MRI data to accurately reconstruct the entire tongue contour from speech signals, advancing acoustic-to-articulatory inversion techniques.
Contribution
It introduces a new method leveraging high-quality MRI data and explores various neural architectures to improve full tongue contour reconstruction from speech.
Findings
Median accuracy of 2.21 mm in tongue contour reconstruction
Effective use of Bi-MSTM architecture with autoencoder
Demonstrates feasibility of full tongue tracking from acoustic data
Abstract
Acoustic articulatory inversion is a major processing challenge, with a wide range of applications from speech synthesis to feedback systems for language learning and rehabilitation. In recent years, deep learning methods have been applied to the inversion of less than a dozen geometrical positions corresponding to sensors glued to easily accessible articulators. It is therefore impossible to know the shape of the whole tongue from root to tip. In this work, we use high-quality real-time MRI data to track the contour of the tongue. The data used to drive the inversion are therefore the unstructured speech signal and the tongue contours. Several architectures relying on a Bi-MSTM including or not an autoencoder to reduce the dimensionality of the latent space, using or not the phonetic segmentation have been explored. The results show that the tongue contour can be recovered with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCleft Lip and Palate Research
