CoMViT: An Efficient Vision Backbone for Supervised Classification in Medical Imaging
Aon Safdar, Mohamed Saadeldin

TL;DR
CoMViT is a lightweight, efficient Vision Transformer designed for medical imaging that achieves high accuracy and interpretability with significantly fewer parameters than traditional models.
Contribution
The paper introduces CoMViT, a novel compact ViT architecture optimized for resource-limited medical imaging, combining convolutional tokenization and other techniques for improved performance.
Findings
Achieves robust performance across 12 MedMNIST datasets.
Maintains high accuracy with only ~4.5M parameters, outperforming larger models.
Provides interpretable Grad-CAM visualizations highlighting clinically relevant regions.
Abstract
Vision Transformers (ViTs) have demonstrated strong potential in medical imaging; however, their high computational demands and tendency to overfit on small datasets limit their applicability in real-world clinical scenarios. In this paper, we present CoMViT, a compact and generalizable Vision Transformer architecture optimized for resource-constrained medical image analysis. CoMViT integrates a convolutional tokenizer, diagonal masking, dynamic temperature scaling, and pooling-based sequence aggregation to improve performance and generalization. Through systematic architectural optimization, CoMViT achieves robust performance across twelve MedMNIST datasets while maintaining a lightweight design with only ~4.5M parameters. It matches or outperforms deeper CNN and ViT variants, offering up to 5-20x parameter reduction without sacrificing accuracy. Qualitative Grad-CAM analyses show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Brain Tumor Detection and Classification
