Exploring Consistency in Cross-Domain Transformer for Domain Adaptive Semantic Segmentation
Kaihong Wang, Donghyun Kim, Rogerio Feris, Kate Saenko and, Margrit Betke

TL;DR
This paper introduces a novel attention consistency approach for domain adaptive transformers in semantic segmentation, aligning attention maps across domains and views to improve target domain performance.
Contribution
It proposes a cross-domain attention map consistency method that enhances domain adaptation in transformer-based semantic segmentation models.
Findings
Outperforms state-of-the-art on GTAV-to-Cityscapes, Synthia-to-Cityscapes, and Cityscapes-to-ACDC benchmarks.
Achieves 1.3, 0.6, and 1.1 percentage point improvements respectively.
Demonstrates effectiveness and generalizability through extensive experiments.
Abstract
While transformers have greatly boosted performance in semantic segmentation, domain adaptive transformers are not yet well explored. We identify that the domain gap can cause discrepancies in self-attention. Due to this gap, the transformer attends to spurious regions or pixels, which deteriorates accuracy on the target domain. We propose to perform adaptation on attention maps with cross-domain attention layers that share features between the source and the target domains. Specifically, we impose consistency between predictions from cross-domain attention and self-attention modules to encourage similar distribution in the attention and output of the model across domains, i.e., attention-level and output-level alignment. We also enforce consistency in attention maps between different augmented views to further strengthen the attention-based alignment. Combining these two components,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
