Adapting Self-Supervised Speech Representations for Cross-lingual Dysarthria Detection in Parkinson's Disease

Abner Hernandez; Eunjung Yeo; Kwanghee Choi; Chin-Jou Li; Zhengjun Yue; Rohan Kumar Das; Jan Rusz; Mathew Magimai Doss; Juan Rafael Orozco-Arroyave; Tom\'as Arias-Vergara; Andreas Maier; Elmar N\"oth; David R. Mortensen; David Harwath; Paula Andrea Perez-Toro

arXiv:2603.22225·cs.CL·March 27, 2026

Adapting Self-Supervised Speech Representations for Cross-lingual Dysarthria Detection in Parkinson's Disease

Abner Hernandez, Eunjung Yeo, Kwanghee Choi, Chin-Jou Li, Zhengjun Yue, Rohan Kumar Das, Jan Rusz, Mathew Magimai Doss, Juan Rafael Orozco-Arroyave, Tom\'as Arias-Vergara, Andreas Maier, Elmar N\"oth, David R. Mortensen, David Harwath, Paula Andrea Perez-Toro

PDF

Open Access

TL;DR

This paper introduces a novel representation-level language shift method to improve cross-lingual dysarthria detection in Parkinson's disease by reducing language-dependent biases in speech representations.

Contribution

It proposes a centroid-based language shift technique to align self-supervised speech representations across languages, enhancing dysarthria detection accuracy in cross-lingual scenarios.

Findings

01

LS improves sensitivity and F1 scores in cross-lingual detection

02

LS reduces language identity in speech embeddings

03

Method yields consistent gains in multilingual settings

Abstract

The limited availability of dysarthric speech data makes cross-lingual detection an important but challenging problem. A key difficulty is that speech representations often encode language-dependent structure that can confound dysarthria detection. We propose a representation-level language shift (LS) that aligns source-language self-supervised speech representations with the target-language distribution using centroid-based vector adaptation estimated from healthy-control speech. We evaluate the approach on oral DDK recordings from Parkinson's disease speech datasets in Czech, German, and Spanish under both cross-lingual and multilingual settings. LS substantially improves sensitivity and F1 in cross-lingual settings, while yielding smaller but consistent gains in multilingual settings. Representation analysis further shows that LS reduces language identity in the embedding space,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Dysphagia Assessment and Management