# NeuroFusion-ViT: A Hybrid CNN–EVA Transformer Model with Cross-Attention Fusion for MRI-Based Alzheimer’s Stage Classification

**Authors:** Derya Öztürk Söylemez, Sevinç Ay Doğru

PMC · DOI: 10.3390/diagnostics16050754 · 2026-03-03

## TL;DR

NeuroFusion-ViT is a new hybrid model combining CNN and Vision Transformer with cross-attention fusion to accurately classify Alzheimer’s stages using MRI scans.

## Contribution

Introduces a novel hybrid CNN–Vision Transformer model with a Gated Cross-Attention Fusion mechanism for Alzheimer’s MRI classification.

## Key findings

- Achieved 99.86% accuracy on the OASIS MRI dataset.
- Outperformed existing single-modal and hybrid models in Alzheimer’s classification.
- Components like cross-attention and gate mechanism significantly improved performance.

## Abstract

Background: Alzheimer’s disease is the most common type of dementia and a progressive neurodegenerative disease that begins with neuronal damage and leads to a reduction in brain tissue. Currently, there is no cure for this disease, and existing approaches focus on alleviating symptoms. Methods: This study proposes NeuroFusion-ViT, a highly accurate and computationally efficient hybrid deep learning model for early-stage detection of Alzheimer’s disease. The model combines an EVA-02-based Vision Transformer (ViT) with the ConvNeXt-Small CNN architecture, providing powerful representation learning that can process both global context and local details. The proposed Gated Cross-Attention Fusion (G-CAF) mechanism dynamically combines two different features, offering high discriminative power and model stability. Results: In experiments conducted on the OASIS MRI dataset, the model achieved 99.86% accuracy, 0.9989 Macro F1, and 0.999 ROC-AUC values, demonstrating clear superiority over single-modal and hybrid models described in the literature. Furthermore, 5-fold cross-validation results also support the model’s high generalizability. Ablation studies showed that each of the components—cross-attention, gate mechanism, Dual LayerNorm, and FFN-Dropout—made a meaningful contribution to performance. Conclusions: The results demonstrate that the NeuroFusion-ViT architecture offers a reliable, stable, and clinically applicable solution for Alzheimer’s stage classification.

## Linked entities

- **Diseases:** Alzheimer’s disease (MONDO:0004975)

## Full-text entities

- **Diseases:** dementia (MESH:D003704), neurodegenerative disease (MESH:D019636), neuronal damage (MESH:D009410), Alzheimer's (MESH:D000544)

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12984189/full.md

---
Source: https://tomesphere.com/paper/PMC12984189