Leveraging Multimodal Methods and Spontaneous Speech for Alzheimer's Disease Identification
Yifan Gao, Long Guo, Hong Liu

TL;DR
This paper presents a multimodal fusion approach using linguistic features and temporal embeddings from pre-trained models to detect Alzheimer's disease and mild cognitive impairment through spontaneous speech, achieving top results in a challenge.
Contribution
It introduces a novel multimodal fusion strategy combining interpretable linguistic features with temporal embeddings for cognitive impairment detection.
Findings
F1-score of 0.649 for classification
RMSE of 2.628 for MMSE prediction
Top overall ranking in ICASSP 2025 challenge
Abstract
Cognitive impairment detection through spontaneous speech is a promising avenue for early diagnosis of Alzheimer's disease (AD) and mild cognitive impairment (MCI), where timely intervention can significantly improve patient outcomes. The PROCESS Grand Challenge at ICASSP 2025 addresses these tasks by promoting innovative classification and regression methods for detecting cognitive decline. In this paper, we propose a multimodal fusion strategy that combines interpretable linguistic features with temporal embeddings extracted from pre-trained models. Our approach achieves an F1-score of 0.649 for the classification task (predicting healthy, MCI, dementia) and an RMSE of 2.628 for the regression task (MMSE score prediction), securing the top overall ranking in the competition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Emotion and Mood Recognition
