Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches

Dena Mujtaba; Nihar Mahapatra

arXiv:2506.00853·cs.SD·August 22, 2025

Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches

Dena Mujtaba, Nihar Mahapatra

PDF

TL;DR

This paper compares personalized and generalized fine-tuning approaches for automatic speech recognition systems to better transcribe stuttered speech, demonstrating that personalized models significantly improve accuracy especially in spontaneous speech scenarios.

Contribution

It introduces a comparative analysis of personalized versus generalized fine-tuning methods for ASR on stuttered speech, highlighting the benefits of personalization.

Findings

01

Personalized ASRs reduce word error rates more than generalized models.

02

Personalization is especially effective in spontaneous speech contexts.

03

Tailored models enhance accessibility for people who stutter.

Abstract

Stuttering -- characterized by involuntary disfluencies such as blocks, prolongations, and repetitions -- is often misinterpreted by automatic speech recognition (ASR) systems, resulting in elevated word error rates and making voice-driven technologies inaccessible to people who stutter. The variability of disfluencies across speakers and contexts further complicates ASR training, compounded by limited annotated stuttered speech data. In this paper, we investigate fine-tuning ASRs for stuttered speech, comparing generalized models (trained across multiple speakers) to personalized models tailored to individual speech characteristics. Using a diverse range of voice-AI scenarios, including virtual assistants and video interviews, we evaluate how personalization affects transcription accuracy. Our findings show that personalized ASRs significantly reduce word error rates, especially in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.