an interpretable vision transformer framework for automated brain tumor classification

Chinedu Emmanuel Mbonu; Tochukwu Sunday Belonwu; Okwuchukwu Ejike Chukwuogo; Kenechukwu Sylvanus Anigbogu

arXiv:2604.21311·cs.CV·April 24, 2026

an interpretable vision transformer framework for automated brain tumor classification

Chinedu Emmanuel Mbonu, Tochukwu Sunday Belonwu, Okwuchukwu Ejike Chukwuogo, Kenechukwu Sylvanus Anigbogu

PDF

TL;DR

This paper introduces an interpretable Vision Transformer-based framework for automated brain tumor classification from MRI scans, achieving high accuracy and clinical interpretability.

Contribution

It presents a novel ViT-based system with a specialized preprocessing and training pipeline, outperforming CNN baselines in brain tumor classification.

Findings

01

Achieved 99.29% test accuracy in four-class brain tumor classification.

02

Utilized attention rollout for clinically interpretable heatmaps.

03

Outperformed CNN-based models in accuracy and recall.

Abstract

Brain tumors represent one of the most critical neurological conditions, where early and accurate diagnosis is directly correlated with patient survival rates. Manual interpretation of Magnetic Resonance Imaging (MRI) scans is time-intensive, subject to inter-observer variability, and demands significant specialist expertise. This paper proposes a deep learning framework for automated four-class brain tumor classification distinguishing glioma, meningioma, pituitary tumor, and healthy brain tissue from a dataset of 7,023 MRI scans. The proposed system employs a Vision Transformer (ViT-B/16) pretrained on ImageNet-21k as the backbone, augmented with a clinically motivated preprocessing and training pipeline. Contrast Limited Adaptive Histogram Equalization (CLAHE) is applied to enhance local contrast and accentuate tumor boundaries invisible to standard normalization. A two-stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.