Multi-Scale Transformer Architecture for Accurate Medical Image Classification
Jiacheng Hu, Yanlin Xiang, Yang Lin, Junliang Du, Hanchao Zhang, Houze, Liu

TL;DR
This paper presents a multi-scale Transformer architecture that significantly improves skin lesion classification accuracy and interpretability, outperforming existing models on the ISIC 2017 dataset.
Contribution
It introduces a novel multi-scale feature fusion mechanism within a Transformer model tailored for medical image classification, enhancing global and local feature extraction.
Findings
Outperforms ResNet50, VGG19, ResNext, and Vision Transformer on key metrics
Demonstrates superior accuracy, AUC, F1-Score, and Precision
Provides interpretable Grad-CAM visualizations aligning with lesion sites
Abstract
This study introduces an AI-driven skin lesion classification algorithm built on an enhanced Transformer architecture, addressing the challenges of accuracy and robustness in medical image analysis. By integrating a multi-scale feature fusion mechanism and refining the self-attention process, the model effectively extracts both global and local features, enhancing its ability to detect lesions with ambiguous boundaries and intricate structures. Performance evaluation on the ISIC 2017 dataset demonstrates that the improved Transformer surpasses established AI models, including ResNet50, VGG19, ResNext, and Vision Transformer, across key metrics such as accuracy, AUC, F1-Score, and Precision. Grad-CAM visualizations further highlight the interpretability of the model, showcasing strong alignment between the algorithm's focus areas and actual lesion sites. This research underscores the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification
