How Well Do Vision Transformers (VTs) Transfer To The Non-Natural Image Domain? An Empirical Study Involving Art Classification
Vincent Tonkes, Matthia Sabatelli

TL;DR
This study empirically evaluates the transfer learning capabilities of Vision Transformers (VTs) in non-natural image domains, specifically art classification, comparing their performance to CNNs.
Contribution
It provides the first comprehensive comparison of VTs and CNNs in transferring learned representations from natural to non-natural images.
Findings
VTs outperform CNNs in transfer learning for art classification
VTs demonstrate strong generalization across different non-natural image tasks
Transformers are more effective feature extractors than CNNs in this context
Abstract
Vision Transformers (VTs) are becoming a valuable alternative to Convolutional Neural Networks (CNNs) when it comes to problems involving high-dimensional and spatially organized inputs such as images. However, their Transfer Learning (TL) properties are not yet well studied, and it is not fully known whether these neural architectures can transfer across different domains as well as CNNs. In this paper we study whether VTs that are pre-trained on the popular ImageNet dataset learn representations that are transferable to the non-natural image domain. To do so we consider three well-studied art classification problems and use them as a surrogate for studying the TL potential of four popular VTs. Their performance is extensively compared against that of four common CNNs across several TL experiments. Our results show that VTs exhibit strong generalization properties and that these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
