A Novel Fusion Architecture for PD Detection Using Semi-Supervised   Speech Embeddings

Tariq Adnan; Abdelrahman Abdelkader; Zipei Liu; Ekram Hossain; Sooyong; Park; MD Saiful Islam; and Ehsan Hoque

arXiv:2405.17206·cs.SD·November 20, 2024·1 cites

A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings

Tariq Adnan, Abdelrahman Abdelkader, Zipei Liu, Ekram Hossain, Sooyong, Park, MD Saiful Islam, and Ehsan Hoque

PDF

Open Access

TL;DR

This paper introduces a new fusion architecture utilizing semi-supervised speech embeddings to detect Parkinson's disease from diverse speech data, achieving high accuracy and robustness across various demographics and unseen datasets.

Contribution

The study presents a novel fusion model that aligns multiple speech embeddings into a cohesive space, outperforming traditional methods and demonstrating robustness across diverse and unseen datasets.

Findings

01

Achieved AUROC of 88.94% and accuracy of 85.65% on the main dataset.

02

Maintained AUROC scores above 78% on unseen clinical datasets.

03

Model performs equitably across demographic subgroups.

Abstract

We present a framework to recognize Parkinson's disease (PD) through an English pangram utterance speech collected using a web application from diverse recording settings and environments, including participants' homes. Our dataset includes a global cohort of 1306 participants, including 392 diagnosed with PD. Leveraging the diversity of the dataset, spanning various demographic properties (such as age, sex, and ethnicity), we used deep learning embeddings derived from semi-supervised models such as Wav2Vec 2.0, WavLM, and ImageBind representing the speech dynamics associated with PD. Our novel fusion model for PD classification, which aligns different speech embeddings into a cohesive feature space, demonstrated superior performance over standard concatenation-based fusion models and other baselines (including models built on traditional acoustic features). In a randomized data split…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis