TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation
Jinyu Yang, Jingjing Liu, Ning Xu, Junzhou Huang

TL;DR
This paper introduces TVT, a novel Vision Transformer-based framework for unsupervised domain adaptation, demonstrating superior transferability and performance over CNN-based methods through a specialized transferability adaptation module and discriminative clustering.
Contribution
The paper proposes the Transferable Vision Transformer (TVT), a new framework that leverages ViT's intrinsic features for improved domain adaptation, including a novel Transferability Adaption Module (TAM).
Findings
ViT shows superior transferability over CNNs in UDA tasks.
TVT achieves significant improvements over state-of-the-art methods.
Incorporating TAM enhances ViT's focus on transferable features.
Abstract
Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain. Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations. With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge, however, remains unexplored in the literature. To fill this gap, this paper first comprehensively investigates the transferability of ViT on a variety of domain adaptation tasks. Surprisingly, ViT demonstrates superior transferability over its CNNs-based counterparts with a large margin, while the performance can be further improved by incorporating adversarial adaptation. Notwithstanding, directly using CNNs-based adaptation strategies fails to take the advantage of ViT's intrinsic merits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
MethodsAttention Is All You Need · Linear Layer · Temporal Adaptive Module · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Adam · Multi-Head Attention · Dense Connections
