Acoustic-to-articulatory Speech Inversion with Multi-task Learning

Yashish M. Siriwardena; Ganesh Sivaraman; Carol Espy-Wilson

arXiv:2205.13755·eess.AS·May 18, 2023·Interspeech·1 cites

Acoustic-to-articulatory Speech Inversion with Multi-task Learning

Yashish M. Siriwardena, Ganesh Sivaraman, Carol Espy-Wilson

PDF

Open Access

TL;DR

This paper introduces a multi-task learning framework using a bidirectional RNN to improve acoustic-to-articulatory speech inversion by jointly learning phoneme mapping, demonstrating superior performance over baseline models.

Contribution

It presents a novel multi-task learning approach with a bidirectional RNN for acoustic-to-articulatory speech inversion, leveraging shared phoneme mapping to enhance accuracy.

Findings

01

The proposed MTL model outperforms baseline models in correlation scores.

02

Joint learning of phoneme mapping improves inversion accuracy.

03

The framework effectively estimates nine tract variables from acoustic features.

Abstract

Multi-task learning (MTL) frameworks have proven to be effective in diverse speech related tasks like automatic speech recognition (ASR) and speech emotion recognition. This paper proposes a MTL framework to perform acoustic-to-articulatory speech inversion by simultaneously learning an acoustic to phoneme mapping as a shared task. We use the Haskins Production Rate Comparison (HPRC) database which has both the electromagnetic articulography (EMA) data and the corresponding phonetic transcriptions. Performance of the system was measured by computing the correlation between estimated and actual tract variables (TVs) from the acoustic to articulatory speech inversion task. The proposed MTL based Bidirectional Gated Recurrent Neural Network (RNN) model learns to map the input acoustic features to nine TVs while outperforming the baseline model trained to perform only acoustic to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing