Real-Time MRI Video synthesis from time aligned phonemes with   sequence-to-sequence networks

Sathvik Udupa; Prasanta Kumar Ghosh

arXiv:2210.16881·eess.AS·November 1, 2022·1 cites

Real-Time MRI Video synthesis from time aligned phonemes with sequence-to-sequence networks

Sathvik Udupa, Prasanta Kumar Ghosh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a sequence-to-sequence transformer-based model with CVAE features to generate realistic real-time MRI videos from phoneme sequences, aiding speech production research.

Contribution

It presents a novel model combining transformers and CVAE for subject-specific rtMRI video synthesis from phonemes, improving realism and generalization.

Findings

01

Model generates realistic rtMRI videos for unseen utterances.

02

Adding CVAE improves learning in difficult subject-specific mappings.

03

Subject-specific training enhances synthesis accuracy.

Abstract

Real-Time Magnetic resonance imaging (rtMRI) of the midsagittal plane of the mouth is of interest for speech production research. In this work, we focus on estimating utterance level rtMRI video from the spoken phoneme sequence. We obtain time-aligned phonemes from forced alignment, to obtain frame-level phoneme sequences which are aligned with rtMRI frames. We propose a sequence-to-sequence learning model with a transformer phoneme encoder and convolutional frame decoder. We then modify the learning by using intermediary features obtained from sampling from a pretrained phoneme-conditioned variational autoencoder (CVAE). We train on 8 subjects in a subject-specific manner and demonstrate the performance with a subjective test. We also use an auxiliary task of air tissue boundary (ATB) segmentation to obtain the objective scores on the proposed models. We show that the proposed method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bloodraven66/text_to_rtmri_synthesis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsConditional Variational Auto Encoder