Self-Distilled Vision Transformer for Domain Generalization

Maryam Sultana; Muzammal Naseer; Muhammad Haris Khan; Salman Khan,; Fahad Shahbaz Khan

arXiv:2207.12392·cs.CV·October 6, 2022·1 cites

Self-Distilled Vision Transformer for Domain Generalization

Maryam Sultana, Muzammal Naseer, Muhammad Haris Khan, Salman Khan,, Fahad Shahbaz Khan

PDF

Open Access 2 Repos

TL;DR

This paper introduces a simple self-distillation method for vision transformers to improve their domain generalization ability, addressing overfitting issues without adding extra parameters, and demonstrates significant empirical gains across multiple datasets.

Contribution

Proposes a parameter-free self-distillation approach for ViTs that reduces overfitting in domain generalization tasks, compatible with various ViT architectures.

Findings

01

Significant performance improvements on five challenging datasets.

02

Outperforms recent state-of-the-art DG methods.

03

Compatible with multiple ViT backbones.

Abstract

In the recent past, several domain generalization (DG) methods have been proposed, showing encouraging performance, however, almost all of them build on convolutional neural networks (CNNs). There is little to no progress on studying the DG performance of vision transformers (ViTs), which are challenging the supremacy of CNNs on standard benchmarks, often built on i.i.d assumption. This renders the real-world deployment of ViTs doubtful. In this paper, we attempt to explore ViTs towards addressing the DG problem. Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains. Inspired by the modular architecture of ViTs, we propose a simple DG approach for ViTs, coined as self-distillation for ViTs. It reduces the overfitting of source domains by easing the learning of input-output mapping problem through curating non-zero…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Remote-Sensing Image Classification