Towards Unbiased Source-Free Object Detection via Vision Foundation Models
Zhi Cai, Yingjie Gao, Yanan Zhang, Xinzhu Ma, Di Huang

TL;DR
This paper introduces a novel framework called DSOD that leverages vision foundation models to reduce source bias in source-free object detection, significantly improving cross-domain detection performance.
Contribution
The paper proposes a VFM-assisted SFOD framework with modules for feature integration and regularization, including a VFM-free variant for resource-constrained scenarios, advancing the state-of-the-art in unbiased source-free detection.
Findings
DSOD achieves 48.1% AP on Normal-to-Foggy adaptation.
DSOD outperforms existing SFOD methods on multiple benchmarks.
The VFM-assisted approach effectively mitigates source bias.
Abstract
Source-Free Object Detection (SFOD) has garnered much attention in recent years by eliminating the need of source-domain data in cross-domain tasks, but existing SFOD methods suffer from the Source Bias problem, i.e. the adapted model remains skewed towards the source domain, leading to poor generalization and error accumulation during self-training. To overcome this challenge, we propose Debiased Source-free Object Detection (DSOD), a novel VFM-assisted SFOD framework that can effectively mitigate source bias with the help of powerful VFMs. Specifically, we propose Unified Feature Injection (UFI) module that integrates VFM features into the CNN backbone through Simple-Scale Extension (SSE) and Domain-aware Adaptive Weighting (DAAW). Then, we propose Semantic-aware Feature Regularization (SAFR) that constrains feature learning to prevent overfitting to source domain characteristics.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection
