Cross-domain Detection Transformer based on Spatial-aware and Semantic-aware Token Alignment
Jinhong Deng, Xiaoyue Zhang, Wen Li, Lixin Duan

TL;DR
This paper introduces SSTA, a novel cross-domain detection transformer method that aligns tokens spatially and semantically using cross-attention maps, significantly improving cross-domain object detection performance.
Contribution
The paper proposes a new token alignment method leveraging cross-attention for spatial and semantic features, enhancing cross-domain detection transformer adaptation.
Findings
Outperforms existing state-of-the-art methods on multiple benchmarks.
Effectively aligns tokens across domains using spatial and semantic cues.
Demonstrates improved generalization in cross-domain object detection.
Abstract
Detection transformers like DETR have recently shown promising performance on many object detection tasks, but the generalization ability of those methods is still quite challenging for cross-domain adaptation scenarios. To address the cross-domain issue, a straightforward way is to perform token alignment with adversarial training in transformers. However, its performance is often unsatisfactory as the tokens in detection transformers are quite diverse and represent different spatial and semantic information. In this paper, we propose a new method called Spatial-aware and Semantic-aware Token Alignment (SSTA) for cross-domain detection transformers. In particular, we take advantage of the characteristics of cross-attention as used in detection transformer and propose the spatial-aware token alignment (SpaTA) and the semantic-aware token alignment (SemTA) strategies to guide the token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Advanced Neural Network Applications
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Feedforward Network · Multi-Head Attention · Absolute Position Encodings
