A Deep Joint Sparse Non-negative Matrix Factorization Framework for   Identifying the Common and Subject-specific Functional Units of Tongue Motion   During Speech

Jonghye Woo; Fangxu Xing; Jerry L. Prince; Maureen Stone; Arnold; Gomez; Timothy G. Reese; Van J. Wedeen; Georges El Fakhri

arXiv:2007.04865·cs.CV·June 8, 2021

A Deep Joint Sparse Non-negative Matrix Factorization Framework for Identifying the Common and Subject-specific Functional Units of Tongue Motion During Speech

Jonghye Woo, Fangxu Xing, Jerry L. Prince, Maureen Stone, Arnold, Gomez, Timothy G. Reese, Van J. Wedeen, Georges El Fakhri

PDF

TL;DR

This paper introduces a novel deep learning framework combining graph-regularized sparse NMF and spectral clustering to identify and analyze common and individual tongue motion functional units during speech, addressing variability and complexity.

Contribution

It develops a deep joint sparse NMF framework with graph regularization and spectral clustering for interpretable tongue motion analysis, improving upon existing methods.

Findings

01

Achieved comparable or better clustering performance on simulated data.

02

Enhanced interpretability and reduced size variability in in vivo tongue motion analysis.

03

Effectively distinguished common and subject-specific functional units.

Abstract

Intelligible speech is produced by creating varying internal local muscle groupings -- i.e., functional units -- that are generated in a systematic and coordinated manner. There are two major challenges in characterizing and analyzing functional units.~First, due to the complex and convoluted nature of tongue structure and function, it is of great importance to develop a method that can accurately decode complex muscle coordination patterns during speech. Second, it is challenging to keep identified functional units across subjects comparable due to their substantial variability. In this work, to address these challenges, we develop a new deep learning framework to identify common and subject-specific functional units of tongue motion during speech.~Our framework hinges on joint deep graph-regularized sparse non-negative matrix factorization (NMF) using motion quantities derived from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSpectral Clustering · Interpretability