DinoTwins: Combining DINO and Barlow Twins for Robust, Label-Efficient Vision Transformers
Michael Podsiadly, Brendon K Lay

TL;DR
This paper introduces DinoTwins, a hybrid self-supervised learning approach combining DINO and Barlow Twins to improve label efficiency and robustness of vision transformers, especially in resource-limited settings.
Contribution
It presents a novel combination of DINO and Barlow Twins techniques, leveraging their strengths to enhance self-supervised learning for vision transformers.
Findings
Achieves comparable performance to DINO with fewer labels.
Maintains strong feature representations suitable for downstream tasks.
Shows improved semantic segmentation capabilities.
Abstract
Training AI models to understand images without costly labeled data remains a challenge. We combine two techniques--DINO (teacher-student learning) and Barlow Twins (redundancy reduction)--to create a model that learns better with fewer labels and less compute. While both DINO and Barlow Twins have independently demonstrated strong performance in self-supervised learning, each comes with limitations--DINO may be sensitive to certain augmentations, and Barlow Twins often requires batch sizes too large to fit on consumer hardware. By combining the redundancy-reduction objective of Barlow Twins with the self-distillation strategy of DINO, we aim to leverage their complementary strengths. We train a hybrid model on the MS COCO dataset using only 10\% of labeled data for linear probing, and evaluate its performance against standalone DINO and Barlow Twins implementations. Preliminary results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
