GenAI Outperforms Human Reviewers in Classifying Alzheimer’s Disease and Related Dementia Research
Duo Wei, Riya Goyal, Tasnim Raisa, Jeannine Elmasri, Jessica Fleck

TL;DR
This study shows that AI outperforms human reviewers in classifying Alzheimer’s disease and related dementia research papers into categories like screening, diagnosis, and intervention.
Contribution
The novel contribution is demonstrating AI's superior performance over trained human reviewers in classifying ADRD literature using mutual information and accuracy metrics.
Findings
AI models showed strong agreement with mutual information scores up to 0.910, while human congruency averaged 0.45 or below.
AI achieved an average classification accuracy of 0.757, significantly higher than human reviewers' 0.495 (p = 0.0019).
AI1 (DeepSeek) had the highest accuracy at 0.818, suggesting potential for AI in improving ADRD literature classification.
Abstract
Categorizing research literature is critical in aging-related studies [1], particularly for Alzheimer’s disease and related dementias (ADRD), where articles are typically classified into screening, diagnosis, and intervention [2]. This study compares the performance of human reviewers and GenAI in classifying ADRD literature. The human group included three trained individuals with computer science backgrounds and expertise in systematic reviews on mental health for older adults, while the AI group comprised DeepSeek (AI1), ChatGPT (AI2), and Google Gemini (AI3). Sixty-six PubMed papers were analyzed to evaluate congruence and accuracy. Congruence was measured using mutual information, a metric from information theory that quantifies shared information between variables. Results revealed strong agreement among AI models, with scores of 0.910 for AI1 and AI2, 0.686 for AI2 and AI3, and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Mental Health via Writing · Meta-analysis and systematic reviews
