Speaker-Independent Acoustic-to-Articulatory Inversion through   Multi-Channel Attention Discriminator

Woo-Jin Chung; Hong-Goo Kang

arXiv:2406.17329·eess.SP·June 26, 2024

Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

Woo-Jin Chung, Hong-Goo Kang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a speaker-independent acoustic-to-articulatory inversion model that uses self-supervised learning representations and an attention-based discriminator, achieving state-of-the-art correlation performance.

Contribution

The novel model combines SSL-based features with an attention discriminator and adversarial training for improved speaker-independent AAI.

Findings

01

Achieves a Pearson correlation coefficient of 0.847.

02

Outperforms previous speaker-independent AAI models.

03

Utilizes a multi-channel attention discriminator for better signal relationship modeling.

Abstract

We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern information in Electromagnetic Articulography (EMA) signals during the AAI process. We train our model using an adversarial approach and introduce an attention-based Multi-duration phoneme discriminator (MDPD) designed to fully capture the intricate relationship among multi-channel articulatory signals. Our method achieves a Pearson correlation coefficient of 0.847, marking state-of-the-art performance in speaker-independent AAI models. The implementation details and code can be found online.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Woo-jin-Chung/Multi-Duration-Phoneme-Discriminator-Acoustic-to-Articulatory-Inversion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing