The Secret Source : Incorporating Source Features to Improve Acoustic-to-Articulatory Speech Inversion
Yashish M. Siriwardena, Carol Espy-Wilson

TL;DR
This paper introduces a novel acoustic-to-articulatory speech inversion system that incorporates source features and uses a temporal convolution approach, significantly improving accuracy on benchmark datasets.
Contribution
It proposes a new SI system that integrates source features as targets and employs a temporal convolution architecture to better model source-vocal tract interactions.
Findings
Achieves nearly 28% improvement on HPRC dataset with source features.
Outperforms existing SI models by around 9% on XRMB dataset.
Demonstrates effectiveness of source features and temporal convolution in SI tasks.
Abstract
In this work, we incorporated acoustically derived source features, aperiodicity, periodicity and pitch as additional targets to an acoustic-to-articulatory speech inversion (SI) system. We also propose a Temporal Convolution based SI system, which uses auditory spectrograms as the input speech representation, to learn long-range dependencies and complex interactions between the source and vocal tract, to improve the SI task. The experiments are conducted with both the Wisconsin X-ray microbeam (XRMB) and Haskins Production Rate Comparison (HPRC) datasets, with comparisons done with respect to three baseline SI model architectures. The proposed SI system with the HPRC dataset gains an improvement of close to 28% when the source features are used as additional targets. The same SI system outperforms the current best performing SI models by around 9% on the XRMB dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsConvolution
