Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation
Runfa Chen, Yu Rong, Shangmin Guo, Jiaqi Han, Fuchun Sun, Tingyang Xu,, Wenbing Huang

TL;DR
This paper introduces a momentum-based smoothing technique and dynamic discrepancy measurement to improve domain adaptive semantic segmentation with Vision Transformers, addressing high-frequency noise issues and enhancing transferability.
Contribution
It proposes a novel momentum network and dynamic discrepancy measurement to enhance the transferability of local ViTs in domain adaptive semantic segmentation.
Findings
Outperforms state-of-the-art methods on sim2real benchmarks
Effectively reduces high-frequency noise in features and pseudo labels
Improves the transferability of local Vision Transformers
Abstract
After the great success of Vision Transformer variants (ViTs) in computer vision, it has also demonstrated great potential in domain adaptive semantic segmentation. Unfortunately, straightforwardly applying local ViTs in domain adaptive semantic segmentation does not bring in expected improvement. We find that the pitfall of local ViTs is due to the severe high-frequency components generated during both the pseudo-label construction and features alignment for target domains. These high-frequency components make the training of local ViTs very unsmooth and hurt their transferability. In this paper, we introduce a low-pass filtering mechanism, momentum network, to smooth the learning dynamics of target domain features and pseudo labels. Furthermore, we propose a dynamic of discrepancy measurement to align the distributions in the source and target domains via dynamic weights to evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Face recognition and analysis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dropout · Layer Normalization · Adam · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer
