Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech
Jingyu Li, Lingchao Mao, Hairong Wang, Zhendong Wang, Xi Mao, Xuelei Sherry Ni

TL;DR
This paper benchmarks various foundation speech and language models on their ability to classify Alzheimer's and related dementias from spontaneous speech, highlighting the effectiveness of acoustic embeddings, especially from ASR models, for early detection.
Contribution
It introduces a benchmarking framework for foundation models in ADRD detection using a large clinical dataset, emphasizing the potential of acoustic features and ASR embeddings for non-invasive diagnosis.
Findings
Whisper-medium achieved highest speech model accuracy (0.731).
BERT with pause annotation was best among language models (accuracy 0.662).
ASR-derived embeddings outperformed other features in detection.
Abstract
Background: Alzheimer's disease and related dementias (ADRD) are progressive neurodegenerative conditions where early detection is vital for timely intervention and care. Spontaneous speech contains rich acoustic and linguistic markers that may serve as non-invasive biomarkers for cognitive decline. Foundation models, pre-trained on large-scale audio or text data, produce high-dimensional embeddings encoding contextual and acoustic features. Methods: We used the PREPARE Challenge dataset, which includes audio recordings from over 1,600 participants with three cognitive statuses: healthy control (HC), mild cognitive impairment (MCI), and Alzheimer's Disease (AD). We excluded non-English, non-spontaneous, or poor-quality recordings. The final dataset included 703 (59.13%) HC, 81 (6.81%) MCI, and 405 (34.06%) AD cases. We benchmarked a range of open-source foundation speech and language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Dropout · Softmax · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · BERT
