DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement

Minghui Wu; Xueling Liu; Jiahuan Fan; Haitao Tang; Yanyong Zhang; Yue Zhang

arXiv:2603.01369·cs.SD·March 3, 2026

DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement

Minghui Wu, Xueling Liu, Jiahuan Fan, Haitao Tang, Yanyong Zhang, Yue Zhang

PDF

Open Access

TL;DR

DARS is a novel dysarthria-aware speech synthesis framework that models pathological rhythm and style to improve ASR accuracy on dysarthric speech, achieving significant reductions in word error rate.

Contribution

It introduces a multi-stage rhythm predictor and style matching mechanism based on Matcha-TTS, specifically designed for dysarthric speech augmentation.

Findings

01

Achieves a Mean Cepstral Distortion of 4.29, closely matching real dysarthric speech.

02

Reduces WER by 54.22% when used for ASR data augmentation.

03

Demonstrates effectiveness on the TORGO dataset.

Abstract

Dysarthric speech exhibits abnormal prosody and significant speaker variability, presenting persistent challenges for automatic speech recognition (ASR). While text-to-speech (TTS)-based data augmentation has shown potential, existing methods often fail to accurately model the pathological rhythm and acoustic style of dysarthric speech. To address this, we propose DARS, a dysarthria-aware rhythm-style synthesis framework based on the Matcha-TTS architecture. DARS incorporates a multi-stage rhythm predictor optimized by contrastive preferences between normal and dysarthric speech, along with a dysarthric-style conditional flow matching mechanism, jointly enhancing temporal rhythm reconstruction and pathological acoustic style simulation. Experiments on the TORGO dataset demonstrate that DARS achieves a Mean Cepstral Distortion (MCD) of 4.29, closely approximating real dysarthric speech.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Music and Audio Processing