Medical Image Classification with KAN-Integrated Transformers and Dilated Neighborhood Attention

Omid Nejati Manzari; Hojat Asgariandehkordi; Taha Koleilat; Yiming Xiao; Hassan Rivaz

arXiv:2502.13693·cs.CV·November 4, 2025·2 cites

Medical Image Classification with KAN-Integrated Transformers and Dilated Neighborhood Attention

Omid Nejati Manzari, Hojat Asgariandehkordi, Taha Koleilat, Yiming Xiao, Hassan Rivaz

PDF

Open Access 1 Repo

TL;DR

MedViTV2 is a novel medical image classification model that integrates KAN layers and Dilated Neighborhood Attention to improve accuracy and efficiency, especially on corrupted data, outperforming previous methods across multiple datasets.

Contribution

Introduces MedViTV2, combining KAN and DiNA with hierarchical hybrid strategy for robust, efficient medical image classification, addressing real-world data corruption challenges.

Findings

01

Achieved state-of-the-art results on 27 out of 29 datasets.

02

Reduced computational complexity by 44%.

03

Improved accuracy by up to 13.4% on benchmarks.

Abstract

Convolutional networks, transformers, hybrid models, and Mamba-based architectures have demonstrated strong performance across various medical image classification tasks. However, these methods were primarily designed to classify clean images using labeled data. In contrast, real-world clinical data often involve image corruptions that are unique to multi-center studies and stem from variations in imaging equipment across manufacturers. In this paper, we introduce the Medical Vision Transformer (MedViTV2), a novel architecture incorporating Kolmogorov-Arnold Network (KAN) layers into the transformer architecture for the first time, aiming for generalized medical image classification. We have developed an efficient KAN block to reduce computational load while enhancing the accuracy of the original MedViT. Additionally, to counteract the fragility of our MedViT when scaled up, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Omid-Nejati/MedViTV2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Medical Imaging and Analysis · AI in cancer detection

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax