Improved acoustic-to-articulatory inversion using representations from   pretrained self-supervised learning models

Sathvik Udupa; Siddarth C; Prasanta Kumar Ghosh

arXiv:2210.16871·eess.AS·November 1, 2022

Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

Sathvik Udupa, Siddarth C, Prasanta Kumar Ghosh

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of pretrained self-supervised learning features for acoustic-to-articulatory inversion, showing they perform comparably to traditional MFCC features across various model complexities and configurations.

Contribution

It demonstrates that SSL features like TERA and DeCoAR are effective for AAI, offering a viable alternative to traditional acoustic features across different neural network models.

Findings

01

SSL features achieve high correlation scores close to MFCC in AAI tasks.

02

Performance of SSL features is consistent across different model sizes.

03

SSL features work well in subject-specific, pooled, and fine-tuned configurations.

Abstract

In this work, we investigate the effectiveness of pretrained Self-Supervised Learning (SSL) features for learning the mapping for acoustic to articulatory inversion (AAI). Signal processing-based acoustic features such as MFCCs have been predominantly used for the AAI task with deep neural networks. With SSL features working well for various other speech tasks such as speech recognition, emotion classification, etc., we experiment with its efficacy for AAI. We train on SSL features with transformer neural networks-based AAI models of 3 different model complexities and compare its performance with MFCCs in subject-specific (SS), pooled and fine-tuned (FT) configurations with data from 10 subjects, and evaluate with correlation coefficient (CC) score on the unseen sentence test set. We find that acoustic feature reconstruction objective-based SSL features such as TERA and DeCoAR work well…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bloodraven66/ssl_aai
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing

MethodsTest