LLMCARE: early detection of cognitive impairment via transformer models enhanced by LLM-generated synthetic data
Ali Zolnour, Hossein Azadmaleki, Yasaman Haghbin, Fatemeh Taherinezhad, Mohamad Javad Momeni Nezhad, Sina Rashidi, Masoud Khani, AmirSajjad Taleban, Samin Mahdizadeh Sani, Maryam Dadkhah, James M. Noble, Suzanne Bakken, Yadollah Yaghoobzadeh, Abdol-Hossein Vahabie

TL;DR
This paper introduces LLMCARE, a system that uses AI to detect early signs of cognitive decline through speech analysis, improving detection accuracy with synthetic data.
Contribution
The novel integration of transformer models and LLM-generated synthetic data for early cognitive impairment detection via speech.
Findings
A fusion model combining transformer embeddings and linguistic features achieved an F1-score of 83.32 on the ADReSSo dataset.
Synthetic data augmentation with MedAlpaca-7B improved performance to F1 = 85.65 at 2× scale.
The pipeline generalized to an MCI-only cohort with F1 = 72.82 on the Delaware corpus.
Abstract
Alzheimer’s disease and related dementias (ADRD) affect nearly five million older adults in the United States, yet more than half remain undiagnosed. Speech-based natural language processing (NLP) provides a scalable approach to identify early cognitive decline by detecting subtle linguistic markers that may precede clinical diagnosis. This study aims to develop and evaluate a speech-based screening pipeline that integrates transformer-based embeddings with handcrafted linguistic features, incorporates synthetic augmentation using large language models (LLMs), and benchmarks unimodal and multimodal LLM classifiers. External validation was performed to assess generalizability to an MCI-only cohort. Transcripts were obtained from the ADReSSo 2021 benchmark dataset (n = 237; derived from the Pitt Corpus, DementiaBank) and the DementiaBank Delaware corpus (n = 205; clinically diagnosed…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Language Development and Disorders · Voice and Speech Disorders
