ADRD-Bench: A Preliminary LLM Benchmark for Alzheimer's Disease and Related Dementias

Guangxin Zhao; Jiahao Zheng; Malaz Boustani; Jarek Nabrzyski; Meng Jiang; Yiyu Shi; Zhi Zheng

arXiv:2602.11460·cs.CL·February 13, 2026

ADRD-Bench: A Preliminary LLM Benchmark for Alzheimer's Disease and Related Dementias

Guangxin Zhao, Jiahao Zheng, Malaz Boustani, Jarek Nabrzyski, Meng Jiang, Yiyu Shi, Zhi Zheng

PDF

Open Access 1 Datasets

TL;DR

ADRD-Bench is a new benchmark dataset designed to evaluate large language models' knowledge and reasoning in Alzheimer's Disease and Related Dementias, combining clinical and caregiving questions to address existing evaluation gaps.

Contribution

This paper introduces ADRD-Bench, the first comprehensive ADRD-specific benchmark for LLMs, including clinical and caregiving questions, and evaluates 33 models to identify limitations and areas for improvement.

Findings

01

Top models achieved over 0.9 accuracy.

02

Significant variability in model reasoning quality.

03

Need for domain-specific enhancements in LLMs.

Abstract

Large language models (LLMs) have shown great potential for healthcare applications. However, existing evaluation benchmarks provide minimal coverage of Alzheimer's Disease and Related Dementias (ADRD). To address this gap, we introduce ADRD-Bench, the first ADRD-specific benchmark dataset designed for rigorous evaluation of LLMs. ADRD-Bench has two components: 1) ADRD Unified QA, a synthesis of 1,352 questions consolidated from seven established medical benchmarks, providing a unified assessment of clinical knowledge; and 2) ADRD Caregiving QA, a novel set of 149 questions derived from the Aging Brain Care (ABC) program, a widely used, evidence-based brain health management program. Guided by a program with national expertise in comprehensive ADRD care, this new set was designed to mitigate the lack of practical caregiving context in existing benchmarks. We evaluated 33…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

IIRL-NotreDame/ADRD-Bench
dataset· 25 dl
25 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Dementia and Cognitive Impairment Research