A Comprehensive Study of Vision Transformers on Dense Prediction Tasks
Kishaan Jeeveswaran, Senthilkumar Kathiresan, Arnav Varma, Omar Magdy,, Bahram Zonooz, and Elahe Arani

TL;DR
This paper compares Vision Transformers and CNNs in dense prediction tasks, showing VTs are more robust and less texture-biased, while CNNs excel at high resolutions.
Contribution
It provides an extensive empirical comparison of VTs and CNNs, highlighting their differences in robustness, reliability, and texture bias in dense prediction tasks.
Findings
VTs are more robust to distribution shifts and adversarial attacks.
CNNs perform better at higher image resolutions in object detection.
VTs produce more reliable and less texture-biased predictions.
Abstract
Convolutional Neural Networks (CNNs), architectures consisting of convolutional layers, have been the standard choice in vision tasks. Recent studies have shown that Vision Transformers (VTs), architectures based on self-attention modules, achieve comparable performance in challenging tasks such as object detection and semantic segmentation. However, the image processing mechanism of VTs is different from that of conventional CNNs. This poses several questions about their generalizability, robustness, reliability, and texture bias when used to extract features for complex tasks. To address these questions, we study and compare VT and CNN architectures as feature extractors in object detection and semantic segmentation. Our extensive empirical results show that the features generated by VTs are more robust to distribution shifts, natural corruptions, and adversarial attacks in both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · COVID-19 diagnosis using AI
