Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches

Ahmed Aboeitta; Ahmed Sharshar; Youssef Nafea; Shady Shehata

arXiv:2508.08027·cs.SD·August 12, 2025

Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches

Ahmed Aboeitta, Ahmed Sharshar, Youssef Nafea, Shady Shehata

PDF

Open Access

TL;DR

This paper benchmarks self-supervised ASR models for dysarthric speech, introduces LLM-based decoding to enhance recognition, and analyzes their generalization and error patterns across severity levels.

Contribution

It systematically evaluates ASR architectures for dysarthric speech and introduces LLM-based decoding to improve accuracy and intelligibility.

Findings

01

LLM-enhanced decoding improves recognition accuracy.

02

LLMs help restore phonemes and correct grammar.

03

Models show varying generalization across datasets.

Abstract

Speech Recognition (ASR) due to phoneme distortions and high variability. While self-supervised ASR models like Wav2Vec, HuBERT, and Whisper have shown promise, their effectiveness in dysarthric speech remains unclear. This study systematically benchmarks these models with different decoding strategies, including CTC, seq2seq, and LLM-enhanced decoding (BART,GPT-2, Vicuna). Our contributions include (1) benchmarking ASR architectures for dysarthric speech, (2) introducing LLM-based decoding to improve intelligibility, (3) analyzing generalization across datasets, and (4) providing insights into recognition errors across severity levels. Findings highlight that LLM-enhanced decoding improves dysarthric ASR by leveraging linguistic constraints for phoneme restoration and grammatical correction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Stuttering Research and Treatment