TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

Jinyu Yang; Jingjing Liu; Ning Xu; Junzhou Huang

arXiv:2108.05988·cs.CV·November 29, 2021·24 cites

TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

Jinyu Yang, Jingjing Liu, Ning Xu, Junzhou Huang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces TVT, a novel Vision Transformer-based framework for unsupervised domain adaptation, demonstrating superior transferability and performance over CNN-based methods through a specialized transferability adaptation module and discriminative clustering.

Contribution

The paper proposes the Transferable Vision Transformer (TVT), a new framework that leverages ViT's intrinsic features for improved domain adaptation, including a novel Transferability Adaption Module (TAM).

Findings

01

ViT shows superior transferability over CNNs in UDA tasks.

02

TVT achieves significant improvements over state-of-the-art methods.

03

Incorporating TAM enhances ViT's focus on transferable features.

Abstract

Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain. Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations. With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge, however, remains unexplored in the literature. To fill this gap, this paper first comprehensively investigates the transferability of ViT on a variety of domain adaptation tasks. Surprisingly, ViT demonstrates superior transferability over its CNNs-based counterparts with a large margin, while the performance can be further improved by incorporating adversarial adaptation. Notwithstanding, directly using CNNs-based adaptation strategies fails to take the advantage of ViT's intrinsic merits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uta-smile/TVT
pytorchOfficial

Videos

TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation· youtube

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI

MethodsAttention Is All You Need · Linear Layer · Temporal Adaptive Module · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Adam · Multi-Head Attention · Dense Connections