VFM-Guided Semi-Supervised Detection Transformer under Source-Free Constraints for Remote Sensing Object Detection

Jianhong Han; Yupei Wang; Liang Chen

arXiv:2508.11167·cs.CV·August 27, 2025

VFM-Guided Semi-Supervised Detection Transformer under Source-Free Constraints for Remote Sensing Object Detection

Jianhong Han, Yupei Wang, Liang Chen

PDF

TL;DR

This paper introduces VG-DETR, a semi-supervised, source-free remote sensing object detection method that leverages a vision foundation model to improve pseudo-label quality and feature robustness without access to source data.

Contribution

It proposes a novel VFM-guided semi-supervised framework for source-free remote sensing detection, integrating semantic priors and dual-level alignment to enhance performance.

Findings

01

VG-DETR outperforms existing methods in remote sensing detection tasks.

02

The VFM-guided pseudo-label mining improves label accuracy and quantity.

03

Dual-level alignment enhances feature robustness against domain gaps.

Abstract

Unsupervised domain adaptation methods have been widely explored to bridge domain gaps. However, in real-world remote-sensing scenarios, privacy and transmission constraints often preclude access to source domain data, which limits their practical applicability. Recently, Source-Free Object Detection (SFOD) has emerged as a promising alternative, aiming at cross-domain adaptation without relying on source data, primarily through a self-training paradigm. Despite its potential, SFOD frequently suffers from training collapse caused by noisy pseudo-labels, especially in remote sensing imagery with dense objects and complex backgrounds. Considering that limited target domain annotations are often feasible in practice, we propose a Vision foundation-Guided DEtection TRansformer (VG-DETR), built upon a semi-supervised framework for SFOD in remote sensing images. VG-DETR integrates a Vision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.