Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition

Vinicius Ribeiro; Yves Laprie

arXiv:2605.20920·cs.CL·May 21, 2026

Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition

Vinicius Ribeiro, Yves Laprie

PDF

TL;DR

This paper proposes evaluating speech articulation synthesis by using phoneme recognition with articulatory features as a proxy, aiming to better capture phonetic nuances than traditional metrics.

Contribution

It introduces a novel evaluation method leveraging phoneme recognition on articulatory features to assess synthesis quality more effectively.

Findings

01

Articulatory feature set is phonetically rich.

02

Phoneme recognition performance varies with different synthetic articulatory features.

03

The method captures nuances in phoneme production better than traditional metrics.

Abstract

Recent advances in machine learning and the availability of articulatory datasets allow vocal tract synthesis to be conditioned on phonetic sequences, a primary task of articulatory speech synthesis. However, quality assessment needs a better definition. Generally, ranking generative models is tricky due to subjectivity. However, articulatory synthesis has the additional difficulty of requiring specialized knowledge in vocal tract anatomy and acoustics. To address this problem, this paper proposes to evaluate speech articulation synthesis using phoneme recognition as a proxy. Our hypothesis is that phoneme recognition using articulatory features better captures nuances in phoneme production, such as correct places of articulation, which traditional metrics (e.g., point-wise distance metrics) do not. We train a neural network with acoustic and articulatory features extracted from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.