Reconstructing Speech from Real-Time Articulatory MRI Using Neural   Vocoders

Yide Yu; Amin Honarmandi Shandiz; L\'aszl\'o T\'oth

arXiv:2104.11598·cs.SD·April 26, 2021·1 cites

Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

Yide Yu, Amin Honarmandi Shandiz, L\'aszl\'o T\'oth

PDF

Open Access

TL;DR

This paper explores reconstructing speech from real-time MRI of articulators using neural vocoders, demonstrating promising results in spectral shape recovery but highlighting the need for further refinement.

Contribution

It introduces a neural network-based method to reconstruct speech from MRI data using spectral vector estimation and neural vocoders, a novel approach in this domain.

Findings

01

Successfully reconstructs gross spectral shape of speech

02

Neural architectures outperform baseline in spectral estimation

03

Further improvements needed for fine spectral detail reproduction

Abstract

Several approaches exist for the recording of articulatory movements, such as eletromagnetic and permanent magnetic articulagraphy, ultrasound tongue imaging and surface electromyography. Although magnetic resonance imaging (MRI) is more costly than the above approaches, the recent developments in this area now allow the recording of real-time MRI videos of the articulators with an acceptable resolution. Here, we experiment with the reconstruction of the speech signal from a real-time MRI recording using deep neural networks. Instead of estimating speech directly, our networks are trained to output a spectral vector, from which we reconstruct the speech signal using the WaveGlow neural vocoder. We compare the performance of three deep neural architectures for the estimation task, combining convolutional (CNN) and recurrence-based (LSTM) neural layers. Besides the mean absolute error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Ultrasonics and Acoustic Wave Propagation