Masked Autoencoders Are Articulatory Learners

Ahmed Adel Attia; Carol Espy-Wilson

arXiv:2210.15195·eess.AS·May 19, 2023

Masked Autoencoders Are Articulatory Learners

Ahmed Adel Attia, Carol Espy-Wilson

PDF

Open Access 1 Repo

TL;DR

This paper introduces a deep learning method using Masked Autoencoders to accurately reconstruct mistracked articulatory recordings in speech datasets, significantly improving data usability for speech research.

Contribution

The study presents a novel application of Masked Autoencoders to recover corrupted articulatory data, enabling the use of previously unusable recordings in speech analysis.

Findings

01

Successfully reconstructed articulatory trajectories for most speakers.

02

Recovered 3.28 hours of previously unusable data.

03

Achieved close match to ground truth articulatory trajectories.

Abstract

Articulatory recordings track the positions and motion of different articulators along the vocal tract and are widely used to study speech production and to develop speech technologies such as articulatory based speech synthesizers and speech inversion systems. The University of Wisconsin X-Ray microbeam (XRMB) dataset is one of various datasets that provide articulatory recordings synced with audio recordings. The XRMB articulatory recordings employ pellets placed on a number of articulators which can be tracked by the microbeam. However, a significant portion of the articulatory recordings are mistracked, and have been so far unsuable. In this work, we present a deep learning based approach using Masked Autoencoders to accurately reconstruct the mistracked articulatory recordings for 41 out of 47 speakers of the XRMB dataset. Our model is able to reconstruct articulatory trajectories…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ahmedadelattia/mae_articulatory_learners
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing