CNN-ViT Fusion with Adaptive Attention Gate for Brain Tumor MRI Classification: A Hybrid Deep Learning Model
Syed Ibad Hasnain, Muhammad Faris, Hafiza Syeda Yusra Tirmizi, Rabail Khowaja, Hafsa Israr

TL;DR
This paper introduces a hybrid deep learning model combining CNN and ViT architectures with an adaptive attention gate for improved brain tumor MRI classification, achieving state-of-the-art accuracy.
Contribution
The paper presents a novel fusion architecture with a dynamic attention mechanism that adaptively merges local and global features for medical image classification.
Findings
Achieved 97.60% test accuracy on Brain Tumor MRI Dataset.
Outperformed single CNN, ViT, and existing fusion methods.
Demonstrated effectiveness of dynamic feature weighting in medical imaging.
Abstract
Early detection and classifying brain tumors using Magnetic Resonance Imaging (MRI) images is highly important but difficult to extract in medical images. Convolutional Neural Networks (CNNs) are good at capturing both local texture and spatial information whereas Vision Transformers (ViTs) are good at capturing long-range global dependencies. We propose a new hybrid architecture that combines a SqueezeNet-style CNN branch with a MobileViT-style global transformer branch, through an Adaptive Attention Gate mechanism, in this paper. The gate learns dynamically per-sample, per-feature weights to weight the contribution of each branch, allowing context-sensitive merging of local and global representations. The proposed model has a test accuracy of 97.60, a precision of 97.30, a recall of 97.50, an F1-score of 97.40, and a macro-average area under the curve (AUC) of 0.9946 with a trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
