Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER

Xiuwen Zheng; Sixun Dong; Bornali Phukon; Mark Hasegawa-Johnson; Chang D. Yoo

arXiv:2601.21347·eess.AS·January 30, 2026

Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER

Xiuwen Zheng, Sixun Dong, Bornali Phukon, Mark Hasegawa-Johnson, Chang D. Yoo

PDF

Open Access

TL;DR

This paper presents a novel LLM-based post-ASR correction method for dysarthric speech that improves semantic fidelity and reduces WER, supported by a new benchmark dataset and comprehensive evaluation.

Contribution

Introduces a large language model agent for post-ASR correction of dysarthric speech, enhancing semantic accuracy beyond traditional WER metrics.

Findings

01

Achieves 14.51% WER reduction on dysarthric speech

02

Substantial semantic improvements in MENLI and Slot Micro F1 scores

03

WERSensitive to domain shift, semantic metrics better predict downstream performance

Abstract

While Automatic Speech Recognition (ASR) is typically benchmarked by word error rate (WER), real-world applications ultimately hinge on semantic fidelity. This mismatch is particularly problematic for dysarthric speech, where articulatory imprecision and disfluencies can cause severe semantic distortions. To bridge this gap, we introduce a Large Language Model (LLM)-based agent for post-ASR correction: a Judge-Editor over the top-k ASR hypotheses that keeps high-confidence spans, rewrites uncertain segments, and operates in both zero-shot and fine-tuned modes. In parallel, we release SAP-Hypo5, the largest benchmark for dysarthric speech correction, to enable reproducibility and future exploration. Under multi-perspective evaluation, our agent achieves a 14.51% WER reduction alongside substantial semantic gains, including a +7.59 pp improvement in MENLI and +7.66 pp in Slot Micro F1 on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonetics and Phonology Research