# Speaker-independent dysarthria severity classification using self-supervised transformers and multi-task learning

**Authors:** Balasundaram Kadirvelu, Lauren Stumpf, Sigourney Waibel, A. Aldo Faisal

PMC · DOI: 10.1371/journal.pdig.0001076 · PLOS Digital Health · 2025-11-12

## TL;DR

This paper introduces a machine learning framework for classifying dysarthria severity from speech, offering a more objective and accessible alternative to traditional assessments.

## Contribution

A novel deep-learning framework using self-supervised transformers and multi-task learning for speaker-independent dysarthria severity classification.

## Key findings

- The SALR framework achieved 70.5% accuracy on the UA-Speech dataset, surpassing prior benchmarks by 16.5%.
- Explainability analysis shows the model reduces reliance on speaker-specific cues and improves latent space structure.
- The framework demonstrates robustness and generalisability for automated dysarthria assessments.

## Abstract

Dysarthria, characterised by slurred speech, is a hallmark of many neurological disorders and brain trauma. Clinical assessment requires an audio-visual investigation by a trained healthcare expert, who evaluates criteria such as respiration, phonation, articulation, resonance, and prosody during speech. Quantitative assessment of dysarthria is challenging due to its complexity, variability, and the subjective nature of human-observation-based scoring methods. We present a novel machine-learning framework using transformers for stratifying and monitoring patient speech. Our framework integrates a wav2vec 2.0 model, pre-trained on raw speech data from healthy individuals. To reduce reliance on speaker-specific characteristics and effectively manage the intrinsic intra-class variability of dysarthric speech, we employ a contrastive learning strategy with a multi-task objective: cross-entropy loss for classifying dysarthria severity, and triplet margin loss to ensure latent embeddings are grouped by severity rather than by speaker. This Speaker-Agnostic Latent Regularisation (SALR) framework provides an objective, accessible, and cost-effective alternative to traditional assessments. On the UA-Speech dataset, SALR achieved 70.5% accuracy and 59.2% F1 using leave-one-subject-out cross-validation—a 16.5% absolute (30% relative) improvement over prior benchmarks. Explainability analysis indicates that our multi-task objective enhances the ordinal structure of the latent space, reducing dependence on speaker-specific cues and demonstrating robustness and generalisability. In conclusion, this proof-of-concept study demonstrates the potential of the SALR framework for speaker-independent dysarthria severity classification, with potential implications for broader clinical applications in automated dysarthria assessments.

Dysarthria, a speech impairment caused by neurological conditions, is a common symptom of a range of neurological disorders, including stroke, head trauma, brain tumours, Parkinson’s disease, multiple sclerosis, motor neuron disease, and cerebral palsy. Accurate assessment of dysarthria is challenging due to the complex nature of speech disorders, the variability among patients, and the biases inherent in human observation. Traditional methods for evaluating dysarthria are often subjective and rely heavily on expert opinions. There is a clear need for more standardised, efficient and accessible tools to assess dysarthria. We have developed a novel deep-learning framework to classify dysarthria severity levels directly from speech recordings without needing expert input. Our framework, tested using the Universal Access Speech dataset, achieved a classification accuracy of 70.5%, surpassing the previous benchmark by a 16.5% increase in accuracy. The results indicate that our framework provides a more consistent and objective way to classify dysarthria severity compared to traditional assessments. This advancement could lead to more reliable dysarthria evaluations in clinical environments, potentially impacting treatment approaches and improving patient care.

## Linked entities

- **Diseases:** stroke (MONDO:0005098), Parkinson’s disease (MONDO:0005180), multiple sclerosis (MONDO:0005301), motor neuron disease (MONDO:0020128), cerebral palsy (MONDO:0006497)

## Full-text entities

- **Diseases:** Dysarthria (MESH:D004401), brain trauma (MESH:D000070642), neurological disorders (MESH:D009461)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12611135/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12611135/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12611135/full.md

---
Source: https://tomesphere.com/paper/PMC12611135