Coding Speech through Vocal Tract Kinematics

Cheol Jun Cho; Peter Wu; Tejas S. Prabhune; Dhruv Agarwal; Gopala K.; Anumanchipalli

arXiv:2406.12998·eess.AS·March 4, 2025·2 cites

Coding Speech through Vocal Tract Kinematics

Cheol Jun Cho, Peter Wu, Tejas S. Prabhune, Dhruv Agarwal, Gopala K., Anumanchipalli

PDF

Open Access 1 Repo

TL;DR

This paper introduces SPARC, a neural framework that encodes and decodes speech through interpretable vocal tract kinematic features, enabling high-quality synthesis and zero-shot voice conversion.

Contribution

The paper presents a novel articulatory coding framework that infers and synthesizes speech from kinematic vocal tract features, achieving high intelligibility and speaker generalization.

Findings

01

Achieves fully intelligible, high-quality speech synthesis from articulatory features.

02

Enables zero-shot voice conversion while preserving speaker identity.

03

Generalizes well to unseen speakers and accents.

Abstract

Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- Speech Articulatory Coding (SPARC). SPARC comprises an articulatory analysis model that infers articulatory features from speech audio, and an articulatory synthesis model that synthesizes speech audio from articulatory features. The articulatory features are kinematic traces of vocal tract articulators and source features, which are intuitively interpretable and controllable, being the actual physical interface of speech production. An additional speaker identity encoder is jointly trained with the articulatory synthesizer to inform the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

berkeley-speech-group/speech-articulatory-coding
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research