# Multimodal AI for Alzheimer Disease Diagnosis: Systematic Review of Datasets, Models, and Modalities

**Authors:** Ziwen Yu, Anthony Mulholland, Tianyan Huang, Qiang Liu

PMC · DOI: 10.2196/85414 · 2026-03-25

## TL;DR

This paper reviews how combining different data types with AI improves Alzheimer's diagnosis and prediction, but highlights the need for standardized benchmarks and better generalization.

## Contribution

The paper provides a unified synthesis of multimodal AI models for AD diagnosis across diverse datasets, enabling cross-domain performance comparisons.

## Key findings

- Multimodal AI models consistently outperformed single-modal approaches in Alzheimer's diagnosis and prognosis.
- ADNI-based models achieved 92.5% average accuracy, while MCI conversion models reached 0.922 average AUC.
- Self-collected datasets showed high accuracy (96%) but lacked generalizability due to small sample sizes.

## Abstract

Early detection of Alzheimer disease (AD) is essential for timely intervention; yet, diagnostic performance varies widely across modalities and datasets. Recent multimodal artificial intelligence (AI) models have made significant progress, but the evidence base remains fragmented due to heterogeneous datasets, modeling frameworks, and reporting quality.

This systematic review aimed to analyze studies on multimodal AI models for AD diagnosis, prognosis, and risk prediction over 5 years. We evaluated dataset characteristics, modality combinations, modeling strategies, performance metrics, and methodological limitations. We further discuss real-world implications and translational pathways.

Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 guidelines, we systematically searched PubMed, IEEE Xplore, Scopus, ACM Digital Library, Cochrane, and arXiv, with the final datasets last searched on November 15, 2025. Studies applying multimodal machine learning or deep learning to AD, mild cognitive impairment, and dementia outcomes were included, whereas studies using a single modality or lacking sufficient methodological detail were excluded. QUADAS-2 (Revised Quality Assessment of Diagnostic Accuracy Studies tool) assessed risk of bias. Extracted performance results were synthesized across 4 major multimodal dataset families.

A total of 66 studies met the inclusion criteria. Across datasets, multimodal models consistently outperformed single-modal baselines. Alzheimer’s Disease Neuroimaging Initiative–based diagnosis achieved an average accuracy of 92.5% (SD 3.8%), while mild cognitive impairment–conversion models achieved an average area under the curve (AUC) of 0.922 (SD 0.045), and several fusion architectures reported AUCs above 0.95. In contrast, UK Biobank risk-prediction studies reported an average AUC of 0.84 (SD 0.056), and this reflects performance in large, population-based datasets. DementiaBank speech-language studies achieved an average AUC of 0.813 (SD 0.042), and cross-lingual AD detection achieved an accuracy of 77% (SD 6.5%). Self-collected multimodal datasets demonstrated average accuracies around 96% (SD 2.4%), but their generalizability is limited due to small sample sizes and single-center designs.

This systematic review demonstrates that multimodal AI models consistently outperform single-modal models for AD diagnosis, prognosis, and risk prediction by integrating complementary biological, clinical, and behavioral information. Unlike prior reviews, this review provides a unified synthesis across heterogeneous clinical, imaging, genetic, and linguistic datasets, enabling cross-domain comparison of modeling strategies and performance. However, the generalizability of reported performance was limited due to substantial heterogeneity in dataset composition, outcome definitions, and validation, and prevalent risks of bias. By evaluating these factors, this review clarifies where current evidence is robust and where caution is warranted. The findings highlight the need for standardized multimodal benchmarks, transparent evaluation protocols, and clinically grounded model design to enable reliable real-world deployment. Overall, this work advances the field by framing multimodal AI not only as a performance-driven tool but also as a translational framework for equitable, interpretable, and scalable AD diagnosis.

## Linked entities

- **Diseases:** Alzheimer disease (MONDO:0004975), dementia (MONDO:0001627)

## Full-text entities

- **Genes:** TNFSF12 (TNF superfamily member 12) [NCBI Gene 8742] {aka APO3L, DR3LG, TNF12, TNLG4A, TWEAK}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, MAPT (microtubule associated protein tau) [NCBI Gene 4137] {aka DDPAC, FTD1, FTDP-17, MAPTL, MSTD, MTBT1}, APOE (apolipoprotein E) [NCBI Gene 348] {aka AD2, APO-E, ApoE4, LDLCQ5, LPG}
- **Diseases:** atherosclerotic cardiovascular disease (MESH:D050197), Amyotrophic Lateral Sclerosis (MESH:D000690), neuropathological (MESH:D009422), vascular dementia (MESH:D015140), neurodegenerative disorder (MESH:D019636), cardiovascular disease (MESH:D002318), Frontotemporal Lobar Degeneration (MESH:D057174), Cognitive Impairment (MESH:D003072), XAI (MESH:C538243), FTD (MESH:D057180), AD (MESH:D000544), ALS (MESH:D008113), BiLSTM (MESH:D000088562), ALBERT (MESH:C535438), OASIS (MESH:C564543), Parkinson (MESH:D010302), FL (MESH:D007859), AI (MESH:C538142), GBM (MESH:D005910), MCI (MESH:D060825), Dementia (MESH:D003704)
- **Chemicals:** FDG (MESH:D019788), ADReFV (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13018777/full.md

---
Source: https://tomesphere.com/paper/PMC13018777