Audio Data Augmentation for Acoustic-to-articulatory Speech Inversion using Bidirectional Gated RNNs
Yashish M. Siriwardena, Ahmed Adel Attia, Ganesh Sivaraman, Carol, Espy-Wilson

TL;DR
This paper demonstrates that data augmentation and a bidirectional Gated Recurrent Neural Network improve the accuracy of acoustic-to-articulatory speech inversion, enhancing performance on both noisy and clean speech data.
Contribution
It introduces a bidirectional Gated Recurrent Neural Network for speech inversion and compares various data augmentation techniques, showing their effectiveness.
Findings
5% relative improvement in correlation over baseline for clean speech
6% average correlation increase with speaker adaptation
Effective noise robustness in speech inversion
Abstract
Data augmentation has proven to be a promising prospect in improving the performance of deep learning models by adding variability to training data. In previous work with developing a noise robust acoustic-to-articulatory speech inversion system, we have shown the importance of noise augmentation to improve the performance of speech inversion in noisy speech. In this work, we compare and contrast different ways of doing data augmentation and show how this technique improves the performance of articulatory speech inversion not only on noisy speech, but also on clean speech data. We also propose a Bidirectional Gated Recurrent Neural Network as the speech inversion system instead of the previously used feed forward neural network. The inversion system uses mel-frequency cepstral coefficients (MFCCs) as the input acoustic features and six vocal tract-variables (TVs) as the output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
