Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator
Xiaofeng Liu, Fangxu Xing, Jerry L. Prince, Jiachen Zhuo, Maureen, Stone, Georges El Fakhri, Jonghye Woo

TL;DR
This paper introduces a novel deep learning framework that translates tagged-MRI sequences into speech audio by leveraging spectrogram intermediates, attention mechanisms, and adversarial training, advancing speech disorder understanding.
Contribution
The work presents a fully convolutional asymmetric translator with self residual attention and disentanglement strategies for MRI-to-audio synthesis, a novel approach in this domain.
Findings
Generated speech waveforms are clearer and more realistic than competing methods.
The framework effectively captures muscular movements related to speech in MRI data.
Experimental results demonstrate superior performance with limited datasets.
Abstract
Understanding the underlying relationship between tongue and oropharyngeal muscle deformation seen in tagged-MRI and intelligible speech plays an important role in advancing speech motor control theories and treatment of speech related-disorders. Because of their heterogeneous representations, however, direct mapping between the two modalities -- i.e., two-dimensional (mid-sagittal slice) plus time tagged-MRI sequence and its corresponding one-dimensional waveform -- is not straightforward. Instead, we resort to two-dimensional spectrograms as an intermediate representation, which contains both pitch and resonance, from which to develop an end-to-end deep learning framework to translate from a sequence of tagged-MRI to its corresponding audio waveform with limited dataset size.~Our framework is based on a novel fully convolutional asymmetry translator with guidance of a self residual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
