Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention   Guided Heterogeneous Translator

Xiaofeng Liu; Fangxu Xing; Jerry L. Prince; Jiachen Zhuo; Maureen; Stone; Georges El Fakhri; Jonghye Woo

arXiv:2206.02284·cs.SD·September 27, 2022

Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator

Xiaofeng Liu, Fangxu Xing, Jerry L. Prince, Jiachen Zhuo, Maureen, Stone, Georges El Fakhri, Jonghye Woo

PDF

Open Access

TL;DR

This paper introduces a novel deep learning framework that translates tagged-MRI sequences into speech audio by leveraging spectrogram intermediates, attention mechanisms, and adversarial training, advancing speech disorder understanding.

Contribution

The work presents a fully convolutional asymmetric translator with self residual attention and disentanglement strategies for MRI-to-audio synthesis, a novel approach in this domain.

Findings

01

Generated speech waveforms are clearer and more realistic than competing methods.

02

The framework effectively captures muscular movements related to speech in MRI data.

03

Experimental results demonstrate superior performance with limited datasets.

Abstract

Understanding the underlying relationship between tongue and oropharyngeal muscle deformation seen in tagged-MRI and intelligible speech plays an important role in advancing speech motor control theories and treatment of speech related-disorders. Because of their heterogeneous representations, however, direct mapping between the two modalities -- i.e., two-dimensional (mid-sagittal slice) plus time tagged-MRI sequence and its corresponding one-dimensional waveform -- is not straightforward. Instead, we resort to two-dimensional spectrograms as an intermediate representation, which contains both pitch and resonance, from which to develop an end-to-end deep learning framework to translate from a sequence of tagged-MRI to its corresponding audio waveform with limited dataset size.~Our framework is based on a novel fully convolutional asymmetry translator with guidance of a self residual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders