Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
Sebastian P. Bayerl, Dominik Wagner, Elmar N\"oth, Korbinian Riedhammer

TL;DR
This study demonstrates that fine-tuning wav2vec 2.0 with multi-task learning significantly improves the detection of various dysfluencies in stuttered speech across different languages and datasets.
Contribution
The paper introduces a novel approach of fine-tuning wav2vec 2.0 for stuttering detection, enhancing its effectiveness for identifying dysfluencies in therapy and research contexts.
Findings
Up to 27% F1-score improvement in dysfluency classification.
Effective cross-lingual transfer of stuttering detection.
Multi-task learning boosts feature relevance for specific speech events.
Abstract
Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech techniques acquired in therapy. This paper shows that fine-tuning wav2vec 2.0 [1] for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features for detecting stuttering in speech; both within and across languages. We evaluate our method on FluencyBank , [2] and the German therapy-centric Kassel State of Fluency (KSoF) [3] dataset by training Support Vector…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
