Speaker dependent articulatory-to-acoustic mapping using real-time MRI   of the vocal tract

Tam\'as G\'abor Csap\'o

arXiv:2008.00889·eess.AS·August 4, 2020

Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

Tam\'as G\'abor Csap\'o

PDF

Open Access 1 Repo

TL;DR

This paper explores speaker-dependent speech prediction from real-time MRI of the vocal tract using deep neural networks, demonstrating the effectiveness of CNN-LSTM models and highlighting the impact of data synchronization issues.

Contribution

It introduces the novel use of rtMRI for articulatory-to-speech mapping and compares various neural network architectures for this task.

Findings

01

CNN-LSTM networks outperform other models in speech prediction accuracy.

02

RTMRI provides detailed articulatory data including velum and pharyngeal regions.

03

Synchronization issues significantly affect prediction quality, as shown in speaker m1's results.

Abstract

Articulatory-to-acoustic (forward) mapping is a technique to predict speech using various articulatory acquisition techniques (e.g. ultrasound tongue imaging, lip video). Real-time MRI (rtMRI) of the vocal tract has not been used before for this purpose. The advantage of MRI is that it has a high `relative' spatial resolution: it can capture not only lingual, labial and jaw motion, but also the velum and the pharyngeal region, which is typically not possible with other techniques. In the current paper, we train various DNNs (fully connected, convolutional and recurrent neural networks) for articulatory-to-speech conversion, using rtMRI as input, in a speaker-specific way. We use two male and two female speakers of the USC-TIMIT articulatory database, each of them uttering 460 sentences. We evaluate the results with objective (Normalized MSE and MCD) and subjective measures (perceptual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BME-SmartLab/mri2speech
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Phonetics and Phonology Research · Speech Recognition and Synthesis