On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts

Kashaf Gulzar; Dominik Wagner; Sebastian P. Bayerl; Florian H\"onig; Tobias Bocklet; Korbinian Riedhammer

arXiv:2512.02027·eess.AS·December 3, 2025

On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts

Kashaf Gulzar, Dominik Wagner, Sebastian P. Bayerl, Florian H\"onig, Tobias Bocklet, Korbinian Riedhammer

PDF

Open Access

TL;DR

This paper introduces a lightweight adaptation method for end-to-end ASR systems to better recognize dysfluencies and fluency-shaping artifacts, highlighting challenges in multilingual settings and English-centric tokenization biases.

Contribution

It presents a novel, parameter-efficient adaptation approach for dysfluency detection in ASR, along with a multi-step fine-tuning strategy for multilingual performance.

Findings

01

Effective dysfluency-aware ASR with lightweight adaptation techniques

02

Identification of English-centric tokenization biases affecting multilingual ASR

03

Limitations in current multilingual E2E systems for dysfluency modeling

Abstract

Automatic transcription of stuttered speech remains a challenge, even for modern end-to-end (E2E) automatic speech recognition (ASR) frameworks. Dysfluencies and fluency-shaping artifacts are often overlooked, resulting in non-verbatim transcriptions with limited clinical and research value. We propose a parameter-efficient adaptation method to decode dysfluencies and fluency modifications as special tokens within transcriptions, evaluated on simulated (LibriStutter, English) and natural (KSoF, German) stuttered speech datasets. To mitigate ASR performance disparities and bias towards English, we introduce a multi-step fine-tuning strategy with language-adaptive pretraining. Tokenization analysis further highlights the tokenizer's English-centric bias, which poses challenges for improving performance on German data. Our findings demonstrate the effectiveness of lightweight adaptation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStuttering Research and Treatment · Voice and Speech Disorders · Speech Recognition and Synthesis