ADRD-Bench: A Preliminary LLM Benchmark for Alzheimer's Disease and Related Dementias
Guangxin Zhao, Jiahao Zheng, Malaz Boustani, Jarek Nabrzyski, Meng Jiang, Yiyu Shi, Zhi Zheng

TL;DR
ADRD-Bench is a new benchmark dataset designed to evaluate large language models' knowledge and reasoning in Alzheimer's Disease and Related Dementias, combining clinical and caregiving questions to address existing evaluation gaps.
Contribution
This paper introduces ADRD-Bench, the first comprehensive ADRD-specific benchmark for LLMs, including clinical and caregiving questions, and evaluates 33 models to identify limitations and areas for improvement.
Findings
Top models achieved over 0.9 accuracy.
Significant variability in model reasoning quality.
Need for domain-specific enhancements in LLMs.
Abstract
Large language models (LLMs) have shown great potential for healthcare applications. However, existing evaluation benchmarks provide minimal coverage of Alzheimer's Disease and Related Dementias (ADRD). To address this gap, we introduce ADRD-Bench, the first ADRD-specific benchmark dataset designed for rigorous evaluation of LLMs. ADRD-Bench has two components: 1) ADRD Unified QA, a synthesis of 1,352 questions consolidated from seven established medical benchmarks, providing a unified assessment of clinical knowledge; and 2) ADRD Caregiving QA, a novel set of 149 questions derived from the Aging Brain Care (ABC) program, a widely used, evidence-based brain health management program. Guided by a program with national expertise in comprehensive ADRD care, this new set was designed to mitigate the lack of practical caregiving context in existing benchmarks. We evaluated 33…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Dementia and Cognitive Impairment Research
