SiT: Self-supervised vIsion Transformer
Sara Atito, Muhammad Awais, Josef Kittler

TL;DR
This paper introduces SiT, a self-supervised vision transformer that effectively learns useful image representations from small datasets, outperforming existing methods and excelling in few-shot learning scenarios.
Contribution
The paper proposes SiT, a flexible self-supervised training framework for vision transformers, demonstrating superior performance over existing methods in image classification and few-shot learning.
Findings
Outperforms existing self-supervised methods significantly
Effective for few-shot learning with small datasets
Learning useful representations via linear classifiers
Abstract
Self-supervised learning methods are gaining increasing traction in computer vision due to their recent success in reducing the gap with supervised learning. In natural language processing (NLP) self-supervised learning and transformers are already the methods of choice. The recent literature suggests that the transformers are becoming increasingly popular also in computer vision. So far, the vision transformers have been shown to work well when pretrained either using a large scale supervised data or with some kind of co-supervision, e.g. in terms of teacher network. These supervised pretrained vision transformers achieve very good results in downstream tasks with minimal changes. In this work we investigate the merits of self-supervised learning for pretraining image/vision transformers and then using them for downstream classification tasks. We propose Self-supervised vIsion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsSolana Customer Service Number +1-833-534-1729
