Estimating articulatory movements in speech production with transformer   networks

Sathvik Udupa; Anwesha Roy; Abhayjeet Singh; Aravind Illa; Prasanta; Kumar Ghosh

arXiv:2104.05017·eess.AS·June 15, 2021

Estimating articulatory movements in speech production with transformer networks

Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta, Kumar Ghosh

PDF

1 Repo

TL;DR

This paper introduces transformer-based models for estimating articulatory movements from speech acoustics and phonemes, achieving significant improvements over existing methods in alignment accuracy and computational efficiency.

Contribution

It applies transformer architectures with explicit duration modeling to both acoustic-to-articulatory inversion and phoneme-to-articulatory motion estimation, addressing alignment challenges and enhancing performance.

Findings

01

154% improvement in correlation coefficient for PTA estimation

02

Up to 3.1% gain in CC for AAI task

03

Demonstrates computational benefits of transformer architecture

Abstract

We estimate articulatory movements in speech production from different modalities - acoustics and phonemes. Acoustic-to articulatory inversion (AAI) is a sequence-to-sequence task. On the other hand, phoneme to articulatory (PTA) motion estimation faces a key challenge in reliably aligning the text and the articulatory movements. To address this challenge, we explore the use of a transformer architecture - FastSpeech, with explicit duration modelling to learn hard alignments between the phonemes and articulatory movements. We also train a transformer model on AAI. We use correlation coefficient (CC) and root mean squared error (rMSE) to assess the estimation performance in comparison to existing methods on both tasks. We observe 154%, 11.8% & 4.8% relative improvement in CC with subject-dependent, pooled and fine-tuning strategies, respectively, for PTA estimation. Additionally, on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bloodraven66/aai_pta_transformers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.