Deep Speech Synthesis from MRI-Based Articulatory Representations

Peter Wu; Tingle Li; Yijing Lu; Yubin Zhang; Jiachen Lian; Alan W; Black; Louis Goldstein; Shinji Watanabe; Gopala K. Anumanchipalli

arXiv:2307.02471·eess.AS·July 6, 2023·Interspeech

Deep Speech Synthesis from MRI-Based Articulatory Representations

Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan W, Black, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli

PDF

Open Access 1 Repo

TL;DR

This paper introduces an MRI-based articulatory feature set and a deep learning model for speech synthesis, improving generalization, efficiency, and fidelity over EMA-based methods by capturing more comprehensive vocal tract information.

Contribution

The paper presents a novel MRI-based feature set and an MRI-to-speech model that outperform EMA-based methods in articulatory speech synthesis.

Findings

01

MRI features are more comprehensive than EMA features.

02

The proposed model improves speech fidelity and efficiency.

03

Optimal MRI feature subset identified for synthesis.

Abstract

In this paper, we study articulatory synthesis, a speech synthesis method using human vocal tract information that offers a way to develop efficient, generalizable and interpretable synthesizers. While recent advances have enabled intelligible articulatory synthesis using electromagnetic articulography (EMA), these methods lack critical articulatory information like excitation and nasality, limiting generalization capabilities. To bridge this gap, we propose an alternative MRI-based feature set that covers a much more extensive articulatory space than EMA. We also introduce normalization and denoising procedures to enhance the generalizability of deep learning methods trained on MRI data. Moreover, we propose an MRI-to-speech model that improves both computational efficiency and speech fidelity. Finally, through a series of ablations, we show that the proposed MRI representation is more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

articulatory/articulatory
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research