Hands-on Evaluation of Visual Transformers for Object Recognition and Detection
Dimitrios N. Vlachogiannis, Dimitrios A. Koutsomitropoulos

TL;DR
This paper evaluates various Vision Transformers (ViTs) for object recognition, detection, and medical imaging, demonstrating their competitive performance and advantages over traditional CNNs in understanding global image context.
Contribution
It provides a comprehensive comparison of pure, hierarchical, and hybrid ViTs against CNNs across multiple tasks and datasets, highlighting the effectiveness of hybrid models like Swin and CvT.
Findings
Hybrid and hierarchical ViTs outperform CNNs in accuracy and efficiency.
Data augmentation significantly improves medical image classification performance.
Swin Transformer achieves a strong balance between accuracy and computational cost.
Abstract
Convolutional Neural Networks (CNNs) for computer vision sometimes struggle with understanding images in a global context, as they mainly focus on local patterns. On the other hand, Vision Transformers (ViTs), inspired by models originally created for language processing, use self-attention mechanisms, which allow them to understand relationships across the entire image. In this paper, we compare different types of ViTs (pure, hierarchical, and hybrid) against traditional CNN models across various tasks, including object recognition, detection, and medical image classification. We conduct thorough tests on standard datasets like ImageNet for image classification and COCO for object detection. Additionally, we apply these models to medical imaging using the ChestX-ray14 dataset. We find that hybrid and hierarchical transformers, especially Swin and CvT, offer a strong balance between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
