A Multi-modal Approach to Dysarthria Detection and Severity Assessment   Using Speech and Text Information

Anuprabha M; Krishna Gurugubelli; V Kesavaraj; Anil Kumar Vuppala

arXiv:2412.16874·cs.AI·April 29, 2025

A Multi-modal Approach to Dysarthria Detection and Severity Assessment Using Speech and Text Information

Anuprabha M, Krishna Gurugubelli, V Kesavaraj, Anil Kumar Vuppala

PDF

Open Access

TL;DR

This paper presents a novel multi-modal approach combining speech and text with cross-attention to improve dysarthria detection and severity assessment, achieving high accuracy on the UA-Speech database.

Contribution

It introduces a cross-attention based method that integrates speech and text modalities for more accurate dysarthria detection and severity assessment, a novel approach in this domain.

Findings

01

Detection accuracy of 99.53% (speaker-dependent)

02

Severity assessment accuracy of 98.12% (speaker-dependent)

03

Enhanced robustness by combining speech and text modalities

Abstract

Automatic detection and severity assessment of dysarthria are crucial for delivering targeted therapeutic interventions to patients. While most existing research focuses primarily on speech modality, this study introduces a novel approach that leverages both speech and text modalities. By employing cross-attention mechanism, our method learns the acoustic and linguistic similarities between speech and text representations. This approach assesses specifically the pronunciation deviations across different severity levels, thereby enhancing the accuracy of dysarthric detection and severity assessment. All the experiments have been performed using UA-Speech dysarthric database. Improved accuracies of 99.53% and 93.20% in detection, and 98.12% and 51.97% for severity assessment have been achieved when speaker-dependent and speaker-independent, unseen and seen words settings are used. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders