Acoustic-to-articulatory Inversion based on Speech Decomposition and   Auxiliary Feature

Jianrong Wang; Jinyu Liu; Longxuan Zhao; Shanyu Wang; Ruiguo Yu; Li; Liu

arXiv:2204.00873·cs.SD·April 5, 2022

Acoustic-to-articulatory Inversion based on Speech Decomposition and Auxiliary Feature

Jianrong Wang, Jinyu Liu, Longxuan Zhao, Shanyu Wang, Ruiguo Yu, Li, Liu

PDF

Open Access

TL;DR

This paper introduces a novel approach for acoustic-to-articulatory inversion that uses speech decomposition and auxiliary features, significantly improving speaker-independent performance and accuracy over existing methods.

Contribution

The study proposes a pre-trained speech decomposition network and an auxiliary feature network to enhance speaker-independent AAI performance, addressing limitations of previous audio-only methods.

Findings

01

Reduces average RMSE by 0.29 in speaker-independent case

02

Increases correlation coefficient by 5.0% in speaker-independent case

03

Outperforms state-of-the-art methods using only audio features

Abstract

Acoustic-to-articulatory inversion (AAI) is to obtain the movement of articulators from speech signals. Until now, achieving a speaker-independent AAI remains a challenge given the limited data. Besides, most current works only use audio speech as input, causing an inevitable performance bottleneck. To solve these problems, firstly, we pre-train a speech decomposition network to decompose audio speech into speaker embedding and content embedding as the new personalized speech features to adapt to the speaker-independent case. Secondly, to further improve the AAI, we propose a novel auxiliary feature network to estimate the lip auxiliary features from the above personalized speech features. Experimental results on three public datasets show that, compared with the state-of-the-art only using the audio speech feature, the proposed method reduces the average RMSE by 0.25 and increases the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research