Speaker conditioned acoustic-to-articulatory inversion using x-vectors

Aravind Illa; Prasanta Kumar Ghosh

arXiv:2006.11536·eess.AS·June 23, 2020·Interspeech

Speaker conditioned acoustic-to-articulatory inversion using x-vectors

Aravind Illa, Prasanta Kumar Ghosh

PDF

TL;DR

This paper investigates the use of x-vectors as speaker-specific conditioning features to improve acoustic-to-articulatory inversion (AAI), demonstrating benefits for both seen and unseen speakers in a closed-set scenario.

Contribution

It introduces the use of x-vectors for speaker conditioning in AAI, showing improved performance and generalization over traditional one-hot encoding methods.

Findings

01

X-vectors improve AAI accuracy for known speakers.

02

X-vectors generalize well to unseen speakers.

03

Conditioning with x-vectors benefits speaker-independent AAI.

Abstract

Speech production involves the movement of various articulators, including tongue, jaw, and lips. Estimating the movement of the articulators from the acoustics of speech is known as acoustic-to-articulatory inversion (AAI). Recently, it has been shown that instead of training AAI in a speaker specific manner, pooling the acoustic-articulatory data from multiple speakers is beneficial. Further, additional conditioning with speaker specific information by one-hot encoding at the input of AAI along with acoustic features benefits the AAI performance in a closed-set speaker train and test condition. In this work, we carry out an experimental study on the benefit of using x-vectors for providing speaker specific information to condition AAI. Experiments with 30 speakers have shown that the AAI performance benefits from the use of x-vectors in a closed set seen speaker condition. Further,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.