PCaM: A Progressive Focus Attention-Based Information Fusion Method for Improving Vision Transformer Domain Adaptation

Zelin Zang; Fei Wang; Liangyu Li; Jinlin Wu; Chunshui Zhao; Zhen Lei; Baigui Sun

arXiv:2506.17232·cs.LG·June 24, 2025

PCaM: A Progressive Focus Attention-Based Information Fusion Method for Improving Vision Transformer Domain Adaptation

Zelin Zang, Fei Wang, Liangyu Li, Jinlin Wu, Chunshui Zhao, Zhen Lei, Baigui Sun

PDF

TL;DR

This paper introduces PCaM, a novel attention mechanism for Vision Transformers that progressively filters background information to enhance foreground focus, significantly improving unsupervised domain adaptation performance across multiple datasets.

Contribution

The paper proposes a lightweight, architecture-agnostic progressive focus cross-attention mechanism with an attentional guidance loss for better foreground feature alignment in ViT-based UDA.

Findings

01

Achieves state-of-the-art results on multiple datasets

02

Significantly improves domain adaptation performance

03

Effectively enhances attention focus on task-relevant regions

Abstract

Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Recent UDA methods based on Vision Transformers (ViTs) have achieved strong performance through attention-based feature alignment. However, we identify a key limitation: foreground object mismatch, where the discrepancy in foreground object size and spatial distribution across domains weakens attention consistency and hampers effective domain alignment. To address this issue, we propose the Progressive Focus Cross-Attention Mechanism (PCaM), which progressively filters out background information during cross-attention, allowing the model to focus on and fuse discriminative foreground semantics across domains. We further introduce an attentional guidance loss that explicitly directs attention toward task-relevant regions, enhancing cross-domain attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.