Self-Distilled Vision Transformer for Domain Generalization
Maryam Sultana, Muzammal Naseer, Muhammad Haris Khan, Salman Khan,, Fahad Shahbaz Khan

TL;DR
This paper introduces a simple self-distillation method for vision transformers to improve their domain generalization ability, addressing overfitting issues without adding extra parameters, and demonstrates significant empirical gains across multiple datasets.
Contribution
Proposes a parameter-free self-distillation approach for ViTs that reduces overfitting in domain generalization tasks, compatible with various ViT architectures.
Findings
Significant performance improvements on five challenging datasets.
Outperforms recent state-of-the-art DG methods.
Compatible with multiple ViT backbones.
Abstract
In the recent past, several domain generalization (DG) methods have been proposed, showing encouraging performance, however, almost all of them build on convolutional neural networks (CNNs). There is little to no progress on studying the DG performance of vision transformers (ViTs), which are challenging the supremacy of CNNs on standard benchmarks, often built on i.i.d assumption. This renders the real-world deployment of ViTs doubtful. In this paper, we attempt to explore ViTs towards addressing the DG problem. Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains. Inspired by the modular architecture of ViTs, we propose a simple DG approach for ViTs, coined as self-distillation for ViTs. It reduces the overfitting of source domains by easing the learning of input-output mapping problem through curating non-zero…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Remote-Sensing Image Classification
