SSDM: Scalable Speech Dysfluency Modeling

Jiachen Lian; Xuanru Zhou; Zoe Ezzes; Jet Vonk; Brittany Morin; David; Baquirin; Zachary Mille; Maria Luisa Gorno Tempini; Gopala Krishna; Anumanchipalli

arXiv:2408.16221·eess.AS·October 7, 2024

SSDM: Scalable Speech Dysfluency Modeling

Jiachen Lian, Xuanru Zhou, Zoe Ezzes, Jet Vonk, Brittany Morin, David, Baquirin, Zachary Mille, Maria Luisa Gorno Tempini, Gopala Krishna, Anumanchipalli

PDF

Open Access 1 Video

TL;DR

SSDM introduces a scalable, end-to-end framework for speech dysfluency modeling using articulatory gestures, a new large-scale corpus, and large language models to improve accuracy and applicability.

Contribution

The paper presents a novel scalable dysfluency modeling framework combining articulatory gestures, a new alignment method, a large simulated corpus, and LLMs, addressing key challenges in the field.

Findings

01

Developed a scalable dysfluency alignment method using articulatory gestures.

02

Created a large-scale simulated dysfluency corpus, Libri-Dys.

03

Built an end-to-end dysfluency modeling system leveraging LLMs.

Abstract

Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions\cite{lian2023unconstrained-udm, lian-anumanchipalli-2024-towards-hudm} suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SSDM: Scalable Speech Dysfluency Modeling· slideslive

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Voice and Speech Disorders