The Secret Source : Incorporating Source Features to Improve   Acoustic-to-Articulatory Speech Inversion

Yashish M. Siriwardena; Carol Espy-Wilson

arXiv:2210.16450·eess.AS·November 1, 2022·1 cites

The Secret Source : Incorporating Source Features to Improve Acoustic-to-Articulatory Speech Inversion

Yashish M. Siriwardena, Carol Espy-Wilson

PDF

Open Access

TL;DR

This paper introduces a novel acoustic-to-articulatory speech inversion system that incorporates source features and uses a temporal convolution approach, significantly improving accuracy on benchmark datasets.

Contribution

It proposes a new SI system that integrates source features as targets and employs a temporal convolution architecture to better model source-vocal tract interactions.

Findings

01

Achieves nearly 28% improvement on HPRC dataset with source features.

02

Outperforms existing SI models by around 9% on XRMB dataset.

03

Demonstrates effectiveness of source features and temporal convolution in SI tasks.

Abstract

In this work, we incorporated acoustically derived source features, aperiodicity, periodicity and pitch as additional targets to an acoustic-to-articulatory speech inversion (SI) system. We also propose a Temporal Convolution based SI system, which uses auditory spectrograms as the input speech representation, to learn long-range dependencies and complex interactions between the source and vocal tract, to improve the SI task. The experiments are conducted with both the Wisconsin X-ray microbeam (XRMB) and Haskins Production Rate Comparison (HPRC) datasets, with comparisons done with respect to three baseline SI model architectures. The proposed SI system with the HPRC dataset gains an improvement of close to 28% when the source features are used as additional targets. The same SI system outperforms the current best performing SI models by around 9% on the XRMB dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsConvolution