Speaker conditioned acoustic-to-articulatory inversion using x-vectors
Aravind Illa, Prasanta Kumar Ghosh

TL;DR
This paper investigates the use of x-vectors as speaker-specific conditioning features to improve acoustic-to-articulatory inversion (AAI), demonstrating benefits for both seen and unseen speakers in a closed-set scenario.
Contribution
It introduces the use of x-vectors for speaker conditioning in AAI, showing improved performance and generalization over traditional one-hot encoding methods.
Findings
X-vectors improve AAI accuracy for known speakers.
X-vectors generalize well to unseen speakers.
Conditioning with x-vectors benefits speaker-independent AAI.
Abstract
Speech production involves the movement of various articulators, including tongue, jaw, and lips. Estimating the movement of the articulators from the acoustics of speech is known as acoustic-to-articulatory inversion (AAI). Recently, it has been shown that instead of training AAI in a speaker specific manner, pooling the acoustic-articulatory data from multiple speakers is beneficial. Further, additional conditioning with speaker specific information by one-hot encoding at the input of AAI along with acoustic features benefits the AAI performance in a closed-set speaker train and test condition. In this work, we carry out an experimental study on the benefit of using x-vectors for providing speaker specific information to condition AAI. Experiments with 30 speakers have shown that the AAI performance benefits from the use of x-vectors in a closed set seen speaker condition. Further,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
