Audio Data Augmentation for Acoustic-to-articulatory Speech Inversion   using Bidirectional Gated RNNs

Yashish M. Siriwardena; Ahmed Adel Attia; Ganesh Sivaraman; Carol; Espy-Wilson

arXiv:2205.13086·eess.AS·June 2, 2023·5 cites

Audio Data Augmentation for Acoustic-to-articulatory Speech Inversion using Bidirectional Gated RNNs

Yashish M. Siriwardena, Ahmed Adel Attia, Ganesh Sivaraman, Carol, Espy-Wilson

PDF

Open Access

TL;DR

This paper demonstrates that data augmentation and a bidirectional Gated Recurrent Neural Network improve the accuracy of acoustic-to-articulatory speech inversion, enhancing performance on both noisy and clean speech data.

Contribution

It introduces a bidirectional Gated Recurrent Neural Network for speech inversion and compares various data augmentation techniques, showing their effectiveness.

Findings

01

5% relative improvement in correlation over baseline for clean speech

02

6% average correlation increase with speaker adaptation

03

Effective noise robustness in speech inversion

Abstract

Data augmentation has proven to be a promising prospect in improving the performance of deep learning models by adding variability to training data. In previous work with developing a noise robust acoustic-to-articulatory speech inversion system, we have shown the importance of noise augmentation to improve the performance of speech inversion in noisy speech. In this work, we compare and contrast different ways of doing data augmentation and show how this technique improves the performance of articulatory speech inversion not only on noisy speech, but also on clean speech data. We also propose a Bidirectional Gated Recurrent Neural Network as the speech inversion system instead of the previously used feed forward neural network. The inversion system uses mel-frequency cepstral coefficients (MFCCs) as the input acoustic features and six vocal tract-variables (TVs) as the output…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing