Speaker dependent acoustic-to-articulatory inversion using real-time MRI   of the vocal tract

Tam\'as G\'abor Csap\'o

arXiv:2008.02098·eess.AS·August 6, 2020

Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract

Tam\'as G\'abor Csap\'o

PDF

Open Access 1 Repo

TL;DR

This paper presents a speaker-dependent acoustic-to-articulatory inversion method using real-time MRI data, demonstrating that combining FC-DNNs and LSTMs produces high-quality, realistic vocal tract images from speech signals.

Contribution

It introduces a novel approach utilizing rtMRI data with deep neural networks, especially LSTMs, for accurate speaker-dependent acoustic-to-articulatory inversion.

Findings

01

LSTMs outperform other neural network models in this task.

02

The combined FC-DNN and LSTM approach achieves a CW-SSIM of 0.94.

03

Generated vocal tract images closely resemble original MRI recordings.

Abstract

Acoustic-to-articulatory inversion (AAI) methods estimate articulatory movements from the acoustic speech signal, which can be useful in several tasks such as speech recognition, synthesis, talking heads and language tutoring. Most earlier inversion studies are based on point-tracking articulatory techniques (e.g. EMA or XRMB). The advantage of rtMRI is that it provides dynamic information about the full midsagittal plane of the upper airway, with a high 'relative' spatial resolution. In this work, we estimated midsagittal rtMRI images of the vocal tract for speaker dependent AAI, using MGC-LSP spectral features as input. We applied FC-DNNs, CNNs and recurrent neural networks, and have shown that LSTMs are the most suitable for this task. As objective evaluation we measured normalized MSE, Structural Similarity Index (SSIM) and its complex wavelet version (CW-SSIM). The results indicate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BME-SmartLab/speech2mri
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Speech and Audio Processing