Deep Neural Convolutive Matrix Factorization for Articulatory   Representation Decomposition

Jiachen Lian; Alan W Black; Louis Goldstein; Gopala Krishna; Anumanchipalli

arXiv:2204.00465·eess.AS·June 22, 2022

Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition

Jiachen Lian, Alan W Black, Louis Goldstein, Gopala Krishna, Anumanchipalli

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural convolutive matrix factorization method to decompose articulatory data into interpretable gestures, bridging articulatory phonology and deep learning for more intelligible speech representations.

Contribution

It presents a novel neural sparse matrix factorization approach to extract phonological gestures from articulatory data, enhancing interpretability and phoneme recognition.

Findings

01

Gestural scores encode phonological information effectively.

02

The method improves interpretability of speech representations.

03

Results demonstrate successful decomposition of articulatory signals.

Abstract

Most of the research on data-driven speech representation learning has focused on raw audios in an end-to-end manner, paying little attention to their internal phonological or gestural structure. This work, investigating the speech representations derived from articulatory kinematics signals, uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores. By applying sparse constraints, the gestural scores leverage the discrete combinatorial properties of phonological gestures. Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully. The proposed work thus makes a bridge between articulatory phonology and deep neural networks to leverage informative, intelligible, interpretable,and efficient speech representations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

berkeley-speech-group/ema_gesture
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Speech and Audio Processing · Hand Gesture Recognition Systems