LLMCARE: early detection of cognitive impairment via transformer models enhanced by LLM-generated synthetic data

Ali Zolnour; Hossein Azadmaleki; Yasaman Haghbin; Fatemeh Taherinezhad; Mohamad Javad Momeni Nezhad; Sina Rashidi; Masoud Khani; AmirSajjad Taleban; Samin Mahdizadeh Sani; Maryam Dadkhah; James M. Noble; Suzanne Bakken; Yadollah Yaghoobzadeh; Abdol-Hossein Vahabie; Masoud Rouhizadeh; Maryam Zolnoori

arXiv:2508.10027·cs.CL·November 14, 2025

LLMCARE: early detection of cognitive impairment via transformer models enhanced by LLM-generated synthetic data

Ali Zolnour, Hossein Azadmaleki, Yasaman Haghbin, Fatemeh Taherinezhad, Mohamad Javad Momeni Nezhad, Sina Rashidi, Masoud Khani, AmirSajjad Taleban, Samin Mahdizadeh Sani, Maryam Dadkhah, James M. Noble, Suzanne Bakken, Yadollah Yaghoobzadeh, Abdol-Hossein Vahabie

PDF

TL;DR

This study develops a speech-based screening pipeline using transformer embeddings, linguistic features, and synthetic data from large language models to improve early detection of cognitive impairment, demonstrating promising results and generalizability.

Contribution

It introduces a novel multimodal pipeline combining transformer embeddings, handcrafted features, and LLM-generated synthetic data for early ADRD detection.

Findings

01

Fusion model achieved F1=83.3 on ADReSSo dataset.

02

LLM augmentation improved data efficiency, with diminishing returns at larger scales.

03

Validation on an independent cohort supports pipeline's potential for clinical screening.

Abstract

Alzheimer's disease and related dementias(ADRD) affect nearly five million older adults in the United States, yet more than half remain undiagnosed. Speech-based natural language processing(NLP) offers a scalable approach for detecting early cognitive decline through subtle linguistic markers that may precede clinical diagnosis. This study develops and evaluates a speech-based screening pipeline integrating transformer embeddings with handcrafted linguistic features, synthetic augmentation using large language models(LLMs), and benchmarking of unimodal and multimodal classifiers. External validation assessed generalizability to a MCI-only cohort. Transcripts were drawn from the ADReSSo 2021 benchmark dataset(n=237, Pitt Corpus) and the DementiaBank Delaware corpus(n=205, MCI vs. controls). Ten transformer models were tested under three fine-tuning strategies. A late-fusion model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.