VFM$^{4}$SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection

Yupeng Zhang; Ruize Han; Ningnan Guo; Wei Feng; Song Wang; Liang Wan

arXiv:2604.21502·cs.CV·April 24, 2026

VFM$^{4}$SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection

Yupeng Zhang, Ruize Han, Ningnan Guo, Wei Feng, Song Wang, Liang Wan

PDF

TL;DR

This paper introduces VFM$^{4}$SDG, a novel framework leveraging a frozen vision foundation model to improve single-domain generalized object detection under diverse and unseen environmental conditions.

Contribution

It proposes a dual-prior learning approach that enhances detector stability across domains by integrating a frozen vision foundation model into encoding and decoding stages.

Findings

01

Outperforms state-of-the-art methods on SDGOD benchmarks

02

Improves robustness of object-background and inter-instance relations

03

Enhances semantic recognition and spatial localization stability

Abstract

In real-world scenarios, continual changes in weather, illumination, and imaging conditions cause significant domain shifts, leading detectors trained on a single source domain to degrade severely in unseen environments. Existing single-domain generalized object detection (SDGOD) methods mainly rely on data augmentation or domain-invariant representation learning, but pay limited attention to detector mechanisms, leaving clear limitations under complex domain shifts. Through analytical experiments, we find that performance degradation is dominated by increasing missed detections, which fundamentally arises from reduced cross-domain stability of the detector: object-background and inter-instance relations become less stable in the encoding stage, while semantic-spatial alignment of query representations also becomes harder to maintain in the decoding stage. To this end, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.