CNN and ViT Efficiency Study on Tiny ImageNet and DermaMNIST Datasets
Aidar Amangeldi, Angsar Taigonyrov, Muhammad Huzaifa Jawad, Chinedu Emmanuel Mbonu

TL;DR
This paper compares convolutional and transformer-based neural networks on Tiny ImageNet and DermaMNIST, showing that fine-tuned Vision Transformers can achieve comparable or better accuracy with reduced inference time and complexity.
Contribution
It introduces a fine-tuning strategy for Vision Transformers that improves efficiency and performance on medical and general image classification tasks.
Findings
Vision Transformers can match or outperform ResNet-18 after fine-tuning.
Transformers achieve faster inference with fewer parameters.
Fine-tuning enhances transformer efficiency on resource-constrained devices.
Abstract
This study evaluates the trade-offs between convolutional and transformer-based architectures on both medical and general-purpose image classification benchmarks. We use ResNet-18 as our baseline and introduce a fine-tuning strategy applied to four Vision Transformer variants (Tiny, Small, Base, Large) on DermatologyMNIST and TinyImageNet. Our goal is to reduce inference latency and model complexity with acceptable accuracy degradation. Through systematic hyperparameter variations, we demonstrate that appropriately fine-tuned Vision Transformers can match or exceed the baseline's performance, achieve faster inference, and operate with fewer parameters, highlighting their viability for deployment in resource-constrained environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax
