Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders
Yide Yu, Amin Honarmandi Shandiz, L\'aszl\'o T\'oth

TL;DR
This paper explores reconstructing speech from real-time MRI of articulators using neural vocoders, demonstrating promising results in spectral shape recovery but highlighting the need for further refinement.
Contribution
It introduces a neural network-based method to reconstruct speech from MRI data using spectral vector estimation and neural vocoders, a novel approach in this domain.
Findings
Successfully reconstructs gross spectral shape of speech
Neural architectures outperform baseline in spectral estimation
Further improvements needed for fine spectral detail reproduction
Abstract
Several approaches exist for the recording of articulatory movements, such as eletromagnetic and permanent magnetic articulagraphy, ultrasound tongue imaging and surface electromyography. Although magnetic resonance imaging (MRI) is more costly than the above approaches, the recent developments in this area now allow the recording of real-time MRI videos of the articulators with an acceptable resolution. Here, we experiment with the reconstruction of the speech signal from a real-time MRI recording using deep neural networks. Instead of estimating speech directly, our networks are trained to output a spectral vector, from which we reconstruct the speech signal using the WaveGlow neural vocoder. We compare the performance of three deep neural architectures for the estimation task, combining convolutional (CNN) and recurrence-based (LSTM) neural layers. Besides the mean absolute error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Ultrasonics and Acoustic Wave Propagation
