Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions

Moreno La Quatra; Maria Francesca Turco; Torbj{\o}rn Svendsen; Giampiero Salvi; Juan Rafael Orozco-Arroyave; Sabato Marco Siniscalchi

arXiv:2406.16128·eess.AS·November 13, 2025·Interspeech·1 cites

Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions

Moreno La Quatra, Maria Francesca Turco, Torbj{\o}rn Svendsen, Giampiero Salvi, Juan Rafael Orozco-Arroyave, Sabato Marco Siniscalchi

PDF

Open Access 1 Repo

TL;DR

This study develops a robust Parkinson's disease detection method from speech using foundational models and speech enhancement, demonstrating improved performance in real-world conditions through fine-tuning and data enhancement techniques.

Contribution

The paper introduces a novel approach combining foundational models and speech enhancement to improve Parkinson's detection accuracy in real-world scenarios.

Findings

01

Fine-tuning foundational models improves accuracy on clean data.

02

Speech enhancement significantly boosts model performance in real-world conditions.

03

Combining top models yields the best detection results in operative environments.

Abstract

This work is concerned with devising a robust Parkinson's (PD) disease detector from speech in real-world operating conditions using (i) foundational models, and (ii) speech enhancement (SE) methods. To this end, we first fine-tune several foundational-based models on the standard PC-GITA (s-PC-GITA) clean data. Our results demonstrate superior performance to previously proposed models. Second, we assess the generalization capability of the PD models on the extended PC-GITA (e-PC-GITA) recordings, collected in real-world operative conditions, and observe a severe drop in performance moving from ideal to real-world conditions. Third, we align training and testing conditions applaying off-the-shelf SE techniques on e-PC-GITA, and a significant boost in performance is observed only for the foundational-based models. Finally, combining the two best foundational-based models trained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

K-STMLab/SSL4PR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis