Training Articulatory Inversion Models for Interspeaker Consistency

Charles McGhee; Mark J.F. Gales; Kate M. Knill

arXiv:2505.20529·cs.SD·June 10, 2025

Training Articulatory Inversion Models for Interspeaker Consistency

Charles McGhee, Mark J.F. Gales, Kate M. Knill

PDF

Open Access

TL;DR

This paper explores training self-supervised models for acoustic-to-articulatory inversion to achieve consistent articulatory predictions across different speakers in English and Russian, introducing a novel evaluation method and training approach.

Contribution

It introduces a new training method and evaluation technique to enhance interspeaker consistency in articulatory inversion models using speech data.

Findings

01

Models trained with the proposed method show improved interspeaker consistency.

02

The evaluation method effectively measures articulatory target similarity across speakers.

03

Results are demonstrated on English and Russian datasets.

Abstract

Acoustic-to-Articulatory Inversion (AAI) attempts to model the inverse mapping from speech to articulation. Exact articulatory prediction from speech alone may be impossible, as speakers can choose different forms of articulation seemingly without reference to their vocal tract structure. However, once a speaker has selected an articulatory form, their productions vary minimally. Recent works in AAI have proposed adapting Self-Supervised Learning (SSL) models to single-speaker datasets, claiming that these single-speaker models provide a universal articulatory template. In this paper, we investigate whether SSL-adapted models trained on single and multi-speaker data produce articulatory targets which are consistent across speaker identities for English and Russian. We do this through the use of a novel evaluation method which extracts articulatory targets using minimal pair sets. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Speech and Audio Processing