Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Ali Alsayegh, Tariq Masood

TL;DR
This study evaluates commercial speech recognition and multimodal large language models on dysarthric speech, revealing severity-dependent performance gaps and potential for semantic preservation despite high error rates.
Contribution
It provides the first empirical baseline comparison of eight commercial ASR and MLLM systems on dysarthric speech, highlighting their strengths and limitations.
Findings
Mild dysarthria systems achieve 3-5% WER, close to typical speech.
Severe dysarthria results in over 49% WER across all systems.
GPT-4o reduces WER by 7.36 percentage points with consistent improvements.
Abstract
Voice-based human-machine interaction is a primary modality for accessing intelligent systems, yet individuals with dysarthria face systematic exclusion due to recognition performance gaps. Whilst automatic speech recognition (ASR) achieves word error rates (WER) below 5% on typical speech, performance degrades dramatically for dysarthric speakers. Multimodal large language models (MLLMs) offer potential for leveraging contextual reasoning to compensate for acoustic degradation, yet their zero-shot capabilities remain uncharacterised. This study evaluates eight commercial speech-to-text services on the TORGO dysarthric speech corpus: four conventional ASR systems (AssemblyAI, Whisper large-v3, Deepgram Nova-3, Nova-3 Medical) and four MLLM-based systems (GPT-4o, GPT-4o Mini, Gemini 2.5 Pro, Gemini 2.5 Flash). Evaluation encompasses lexical accuracy, semantic preservation, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonocardiography and Auscultation Techniques
