On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts
Kashaf Gulzar, Dominik Wagner, Sebastian P. Bayerl, Florian H\"onig, Tobias Bocklet, Korbinian Riedhammer

TL;DR
This paper introduces a lightweight adaptation method for end-to-end ASR systems to better recognize dysfluencies and fluency-shaping artifacts, highlighting challenges in multilingual settings and English-centric tokenization biases.
Contribution
It presents a novel, parameter-efficient adaptation approach for dysfluency detection in ASR, along with a multi-step fine-tuning strategy for multilingual performance.
Findings
Effective dysfluency-aware ASR with lightweight adaptation techniques
Identification of English-centric tokenization biases affecting multilingual ASR
Limitations in current multilingual E2E systems for dysfluency modeling
Abstract
Automatic transcription of stuttered speech remains a challenge, even for modern end-to-end (E2E) automatic speech recognition (ASR) frameworks. Dysfluencies and fluency-shaping artifacts are often overlooked, resulting in non-verbatim transcriptions with limited clinical and research value. We propose a parameter-efficient adaptation method to decode dysfluencies and fluency modifications as special tokens within transcriptions, evaluated on simulated (LibriStutter, English) and natural (KSoF, German) stuttered speech datasets. To mitigate ASR performance disparities and bias towards English, we introduce a multi-step fine-tuning strategy with language-adaptive pretraining. Tokenization analysis further highlights the tokenizer's English-centric bias, which poses challenges for improving performance on German data. Our findings demonstrate the effectiveness of lightweight adaptation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStuttering Research and Treatment · Voice and Speech Disorders · Speech Recognition and Synthesis
