CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection

Boyang Dai; Zeng Fan; Zihao Qi; Meng Lou; Yizhou Yu

arXiv:2602.22621·cs.CV·February 27, 2026

CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection

Boyang Dai, Zeng Fan, Zihao Qi, Meng Lou, Yizhou Yu

PDF

Open Access 3 Reviews

TL;DR

CGSA introduces an object-centric, slot-aware framework for source-free domain adaptive object detection, leveraging hierarchical slot awareness and class-guided contrast to improve cross-domain detection without source data.

Contribution

It is the first to incorporate object-centric learning with slot-aware adaptation into SF-DAOD, enhancing domain invariance and semantic consistency.

Findings

01

Outperforms previous SF-DAOD methods on multiple datasets.

02

Demonstrates the effectiveness of object-centric design in domain adaptation.

03

Provides theoretical analysis supporting the proposed components.

Abstract

Source-Free Domain Adaptive Object Detection (SF-DAOD) aims to adapt a detector trained on a labeled source domain to an unlabeled target domain without retaining any source data. Despite recent progress, most popular approaches focus on tuning pseudo-label thresholds or refining the teacher-student framework, while overlooking object-level structural cues within cross-domain data. In this work, we present CGSA, the first framework that brings Object-Centric Learning (OCL) into SF-DAOD by integrating slot-aware adaptation into the DETR-based detector. Specifically, our approach integrates a Hierarchical Slot Awareness (HSA) module into the detector to progressively disentangle images into slot representations that act as visual priors. These slots are then guided toward class semantics via a Class-Guided Slot Contrast (CGSC) module, maintaining semantic consistency and prompting…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

The authors conduct experiments on multiple benchmark datasets, showing that their method achieves better performance than previous approaches. The paper includes theoretical discussions supporting the generalization ability of the proposed method.

Weaknesses

The paper’s description of the proposed method, especially the role of slot attention, is unclear. From the current presentation, it appears that slot attention is applied to the queries in DETR. However, it is not clear how this differs in essence from standard attention mechanisms. The authors should provide a clear comparison or ablation study to demonstrate why slot attention is necessary in this context. The paper claims that slot attention helps capture object-level features. However, the

Reviewer 02Rating 6Confidence 4

Strengths

* The paper is easy to follow. * Using a slot-aware framework for object-level alignment is a reasonable approach. * Experiments on five cross-domain shows the effectiveness of the proposed method.

Weaknesses

1. The performance of the base model should be reported(e.g., RT-DETR) for better evaluation. 2. As shown in Figure 3, the HSA module adapts the pre-trained DINO model. Therefore, the additional computation overhead needs to be analyzed. 3. Since the method uses a pre-trained model to inject feature knowledge, it is recommended to add some comparative introductions with existing methods that use VLM for knowledge injection, such as [1][2]. And recent DETR-based SFOD methods should also be discus

Reviewer 03Rating 6Confidence 4

Strengths

1. The method achieves state-of-the-art performance across multiple datasets. 2. The authors provide a solid theoretical analysis explaining why slot-based features can offer domain-invariant priors.

Weaknesses

1. The HAS module introduces additional parameters and computation overhead. It would be beneficial to provide a comparison of speed and parameter size before and after adding HAS. 2. The authors did not provide the performance of the source-only model after adding HAS. This omission makes it unclear whether the performance improvement stems from a stronger source-only model or from HAS enhancing the adaptation capability. 3. HAS requires self-supervised pretraining on the COCO dataset. It wou

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Face recognition and analysis