Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches
Dena Mujtaba, Nihar Mahapatra

TL;DR
This paper compares personalized and generalized fine-tuning approaches for automatic speech recognition systems to better transcribe stuttered speech, demonstrating that personalized models significantly improve accuracy especially in spontaneous speech scenarios.
Contribution
It introduces a comparative analysis of personalized versus generalized fine-tuning methods for ASR on stuttered speech, highlighting the benefits of personalization.
Findings
Personalized ASRs reduce word error rates more than generalized models.
Personalization is especially effective in spontaneous speech contexts.
Tailored models enhance accessibility for people who stutter.
Abstract
Stuttering -- characterized by involuntary disfluencies such as blocks, prolongations, and repetitions -- is often misinterpreted by automatic speech recognition (ASR) systems, resulting in elevated word error rates and making voice-driven technologies inaccessible to people who stutter. The variability of disfluencies across speakers and contexts further complicates ASR training, compounded by limited annotated stuttered speech data. In this paper, we investigate fine-tuning ASRs for stuttered speech, comparing generalized models (trained across multiple speakers) to personalized models tailored to individual speech characteristics. Using a diverse range of voice-AI scenarios, including virtual assistants and video interviews, we evaluate how personalization affects transcription accuracy. Our findings show that personalized ASRs significantly reduce word error rates, especially in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
