Making the Best of Both Worlds: A Domain-Oriented Transformer for Unsupervised Domain Adaptation
Wenxuan Ma, Jinming Zhang, Shuang Li, Chi Harold Liu, Yulin Wang, Wei, Li

TL;DR
This paper introduces a Domain-Oriented Transformer (DOT) that improves unsupervised domain adaptation by learning separate domain-specific representations and classifiers, addressing limitations of traditional feature alignment methods.
Contribution
The paper proposes a novel DOT model with dual domain-specific spaces and classifiers, enhancing target domain discriminability and reducing source bias in UDA.
Findings
Achieves state-of-the-art results on multiple benchmarks.
Effectively preserves domain-specific discriminability.
Utilizes contrastive alignment and pseudo-label refinement strategies.
Abstract
Extensive studies on Unsupervised Domain Adaptation (UDA) have propelled the deployment of deep learning from limited experimental datasets into real-world unconstrained domains. Most UDA approaches align features within a common embedding space and apply a shared classifier for target prediction. However, since a perfectly aligned feature space may not exist when the domain discrepancy is large, these methods suffer from two limitations. First, the coercive domain alignment deteriorates target domain discriminability due to lacking target label supervision. Second, the source-supervised classifier is inevitably biased to source data, thus it may underperform in target domain. To alleviate these issues, we propose to simultaneously conduct feature alignment in two individual spaces focusing on different domains, and create for each space a domain-oriented classifier tailored…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Byte Pair Encoding · Label Smoothing · Residual Connection
