Speech Recognition-based Feature Extraction for Enhanced Automatic   Severity Classification in Dysarthric Speech

Yerin Choi; Jeehyun Lee; Myoung-Wan Koo

arXiv:2412.03784·cs.SD·December 6, 2024

Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech

Yerin Choi, Jeehyun Lee, Myoung-Wan Koo

PDF

Open Access

TL;DR

This paper introduces a novel feature extraction method using fine-tuned ASR transcription for dysarthric speech, significantly improving automatic severity classification accuracy while maintaining explainability.

Contribution

It proposes a new ASR-based feature extraction approach that captures detailed pronunciation and prosodic features, addressing limitations of existing methods in clinical severity prediction.

Findings

01

Achieved a balanced accuracy of 83.72% in severity prediction.

02

Enhanced feature extraction captures finer pronunciation details.

03

Outperformed existing feature-based methods in accuracy.

Abstract

Due to the subjective nature of current clinical evaluation, the need for automatic severity evaluation in dysarthric speech has emerged. DNN models outperform ML models but lack user-friendly explainability. ML models offer explainable results at a feature level, but their performance is comparatively lower. Current ML models extract various features from raw waveforms to predict severity. However, existing methods do not encompass all dysarthric features used in clinical evaluation. To address this gap, we propose a feature extraction method that minimizes information loss. We introduce an ASR transcription as a novel feature extraction source. We finetune the ASR model for dysarthric speech, then use this model to transcribe dysarthric speech and extract word segment boundary information. It enables capturing finer pronunciation and broader prosodic features. These features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Phonetics and Phonology Research