GenAI Outperforms Human Reviewers in Classifying Alzheimer’s Disease and Related Dementia Research

Duo Wei; Riya Goyal; Tasnim Raisa; Jeannine Elmasri; Jessica Fleck

PMC · DOI:10.1093/geroni/igaf122.2250·December 31, 2025

GenAI Outperforms Human Reviewers in Classifying Alzheimer’s Disease and Related Dementia Research

Duo Wei, Riya Goyal, Tasnim Raisa, Jeannine Elmasri, Jessica Fleck

PDF

Open Access

TL;DR

This study shows that AI outperforms human reviewers in classifying Alzheimer’s disease and related dementia research papers into categories like screening, diagnosis, and intervention.

Contribution

The novel contribution is demonstrating AI's superior performance over trained human reviewers in classifying ADRD literature using mutual information and accuracy metrics.

Findings

01

AI models showed strong agreement with mutual information scores up to 0.910, while human congruency averaged 0.45 or below.

02

AI achieved an average classification accuracy of 0.757, significantly higher than human reviewers' 0.495 (p = 0.0019).

03

AI1 (DeepSeek) had the highest accuracy at 0.818, suggesting potential for AI in improving ADRD literature classification.

Abstract

Categorizing research literature is critical in aging-related studies [1], particularly for Alzheimer’s disease and related dementias (ADRD), where articles are typically classified into screening, diagnosis, and intervention [2]. This study compares the performance of human reviewers and GenAI in classifying ADRD literature. The human group included three trained individuals with computer science backgrounds and expertise in systematic reviews on mental health for older adults, while the AI group comprised DeepSeek (AI1), ChatGPT (AI2), and Google Gemini (AI3). Sixty-six PubMed papers were analyzed to evaluate congruence and accuracy. Congruence was measured using mutual information, a metric from information theory that quantifies shared information between variables. Results revealed strong agreement among AI models, with scores of 0.910 for AI1 and AI2, 0.686 for AI2 and AI3, and…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Diseases2

Alzheimer’s disease dementia

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Mental Health via Writing · Meta-analysis and systematic reviews