Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
Emmy Postma, Cristian Tejedor-Garcia

TL;DR
This study evaluates the effectiveness of three pre-trained audio embeddings in classifying Parkinson's Disease from speech data, highlighting their strengths, biases, and challenges in clinical speech analysis.
Contribution
It systematically compares OpenL3, VGGish, and Wav2Vec2.0 embeddings for PD classification, revealing their relative performance and biases in speech-based diagnostics.
Findings
OpenL3 outperforms others in key speech tasks
Wav2Vec2.0 exhibits gender bias favoring male speakers
Atypical speech patterns pose challenges for current models
Abstract
Speech impairments are prevalent biomarkers for Parkinson's Disease (PD), motivating the development of diagnostic techniques using speech data for clinical applications. Although deep acoustic features have shown promise for PD classification, their effectiveness often varies due to individual speaker differences, a factor that has not been thoroughly explored in the existing literature. This study investigates the effectiveness of three pre-trained audio embeddings (OpenL3, VGGish and Wav2Vec2.0 models) for PD classification. Using the NeuroVoz dataset, OpenL3 outperforms others in diadochokinesis (DDK) and listen and repeat (LR) tasks, capturing critical acoustic features for PD detection. Only Wav2Vec2.0 shows significant gender bias, achieving more favorable results for male speakers, in DDK tasks. The misclassified cases reveal challenges with atypical speech patterns,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
