Speech Audio Synthesis from Tagged MRI and Non-Negative Matrix   Factorization via Plastic Transformer

Xiaofeng Liu; Fangxu Xing; Maureen Stone; Jiachen Zhuo; Sidney Fels,; Jerry L. Prince; Georges El Fakhri; Jonghye Woo

arXiv:2309.14586·cs.SD·September 27, 2023

Speech Audio Synthesis from Tagged MRI and Non-Negative Matrix Factorization via Plastic Transformer

Xiaofeng Liu, Fangxu Xing, Maureen Stone, Jiachen Zhuo, Sidney Fels,, Jerry L. Prince, Georges El Fakhri, Jonghye Woo

PDF

Open Access

TL;DR

This paper introduces a novel deep learning framework called Plastic Light Transformer (PLT) for converting MRI-derived weighting maps into speech audio, leveraging advanced transformer techniques to improve speech synthesis quality.

Contribution

The work presents the first end-to-end deep learning model using PLT to synthesize speech from MRI-based functional units, incorporating innovative bias and pooling mechanisms for variable input sizes.

Findings

01

Outperforms conventional models in speech synthesis quality

02

Effectively models global correlations in matrix inputs

03

Maintains high realism with limited training data

Abstract

The tongue's intricate 3D structure, comprising localized functional units, plays a crucial role in the production of speech. When measured using tagged MRI, these functional units exhibit cohesive displacements and derived quantities that facilitate the complex process of speech production. Non-negative matrix factorization-based approaches have been shown to estimate the functional units through motion features, yielding a set of building blocks and a corresponding weighting map. Investigating the link between weighting maps and speech acoustics can offer significant insights into the intricate process of speech production. To this end, in this work, we utilize two-dimensional spectrograms as a proxy representation, and develop an end-to-end deep learning framework for translating weighting maps to their corresponding audio waveforms. Our proposed plastic light transformer (PLT)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research

MethodsConvolution