MTTrans: Cross-Domain Object Detection with Mean-Teacher Transformer
Jinze Yu, Jiaming Liu, Xiaobao Wei, Haoyi Zhou, Yohei Nakata, Denis, Gudovskiy, Tomoyuki Okuno, Jianxin Li, Kurt Keutzer, Shanghang Zhang

TL;DR
This paper introduces MTTrans, a cross-domain object detection method using a mean teacher transformer framework with multi-level feature alignment, significantly improving performance in unsupervised domain adaptation scenarios.
Contribution
The paper proposes a novel end-to-end cross-domain detection transformer with multi-level feature alignment and a mean teacher framework, enhancing unsupervised domain adaptation in object detection.
Findings
Achieves state-of-the-art results in three domain adaptation scenarios.
Significant performance boost from 52.6 to 57.9 mAP in the Sim10k to Cityscapes scenario.
Effective use of unlabeled target data through pseudo-labeling and feature alignment.
Abstract
Recently, DEtection TRansformer (DETR), an end-to-end object detection pipeline, has achieved promising performance. However, it requires large-scale labeled data and suffers from domain shift, especially when no labeled data is available in the target domain. To solve this problem, we propose an end-to-end cross-domain detection Transformer based on the mean teacher framework, MTTrans, which can fully exploit unlabeled target domain data in object detection training and transfer knowledge between domains via pseudo labels. We further propose the comprehensive multi-level feature alignment to improve the pseudo labels generated by the mean teacher framework taking advantage of the cross-scale self-attention mechanism in Deformable DETR. Image and object features are aligned at the local, global, and instance levels with domain query-based feature alignment (DQFA), bi-level graph-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Deformable Attention Module · Deformable DETR · Absolute Position Encodings · Multi-Head Attention · Layer Normalization · Residual Connection · Softmax · Label Smoothing
