# X-ViTCNN: A Novel Network-Level Fusion of Transfer Learning and Customized Vision Transformer for Multi-Stage Alzheimer’s Disease Prediction Using MRI Scans

**Authors:** Armughan Ali, Hooria Shahbaz, Shahid Mohammad Ganie, Manahil Mohammed Alfuraydan

PMC · DOI: 10.3390/diagnostics16060835 · 2026-03-11

## TL;DR

This paper introduces X-ViTCNN, a new AI model that improves Alzheimer's disease prediction using MRI scans by combining different neural network techniques and offering better accuracy and interpretability.

## Contribution

The novel X-ViTCNN framework combines transfer learning and customized Vision Transformer with CNNs for multi-stage Alzheimer’s prediction with improved accuracy and interpretability.

## Key findings

- X-ViTCNN achieved 97.98% accuracy on the ADNI dataset and 94.52% on the OASIS dataset.
- The model outperformed individual baselines and other pre-trained architectures in multi-stage Alzheimer’s classification.
- Grad-CAM visualizations provided interpretable insights into the model's decision-making process.

## Abstract

Background/Objectives: Alzheimer’s disease (AD), the most prevalent form of dementia, is characterized by an overall decline in cognitive functioning and represents a major public health crisis. It remains critical to be able to accurately and quickly diagnose patients with AD; however, recent deep learning approaches using MRI data do not provide sample generalization, have high computational requirements, and offer little interpretability. Methods: In this study, we present a new framework called eXplorative ViT-CNN (X-ViTCNN) that combines a customized Vision Transformer model with two previously trained CNNs (DenseNet201 and MobileNetV2). With our proposed preprocessing approach using contrast-enhanced preprocessing to highlight neuroanatomical features as well as Bayesian Optimization to tune hyperparameters, we fuse local structural features originating from the CNNs with global representations from the transformer and feed the final result to fully connected dense layers for multi-stage classification. We also use Grad-CAM visualizations to provide insight into how our model arrived at its classification. Results: Experiments conducted on ADNI and OASIS datasets demonstrate the superiority of X-ViTCNN, achieving accuracies of 97.98% and 94.52%, respectively. The model outperformed individual baselines and other pre-trained architectures, showing balanced sensitivity and specificity across all AD stages. Conclusions: The proposed X-ViTCNN framework is a powerful, interpretable method for predicting the development of multi-stage Alzheimer’s disease using MRI scans. The combination of complementary feature learning, automatic hyperparameter optimization and interpretability through visualization make it an excellent potential tool for clinicians to support their decision making in the early diagnosis and ongoing monitoring of persons with Alzheimer’s disease.

## Linked entities

- **Diseases:** Alzheimer’s disease (MONDO:0004975)

## Full-text entities

- **Diseases:** AD (MESH:D000544), dementia (MESH:D003704), decline in cognitive functioning (MESH:D003072)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13025822/full.md

---
Source: https://tomesphere.com/paper/PMC13025822