DOD-SA: Infrared-Visible Decoupled Object Detection with Single-Modality Annotations

Hang Jin; Chenqiang Gao; Junjie Guo; Fangcen Liu; Kanghui Tian; Qinyao Chang

arXiv:2508.10445·cs.CV·August 15, 2025

DOD-SA: Infrared-Visible Decoupled Object Detection with Single-Modality Annotations

Hang Jin, Chenqiang Gao, Junjie Guo, Fangcen Liu, Kanghui Tian, Qinyao Chang

PDF

TL;DR

This paper introduces DOD-SA, a novel infrared-visible object detection framework that reduces annotation costs by using single-modality annotations and cross-modality knowledge transfer, achieving superior results on the DroneVehicle dataset.

Contribution

The paper proposes a decoupled detection framework with a collaborative teacher-student network and a progressive training strategy to enable effective cross-modality detection with minimal annotations.

Findings

01

Outperforms state-of-the-art methods on DroneVehicle dataset

02

Effectively transfers knowledge between modalities with pseudo-labels

03

Reduces annotation costs while maintaining detection accuracy

Abstract

Infrared-visible object detection has shown great potential in real-world applications, enabling robust all-day perception by leveraging the complementary information of infrared and visible images. However, existing methods typically require dual-modality annotations to output detection results for both modalities during prediction, which incurs high annotation costs. To address this challenge, we propose a novel infrared-visible Decoupled Object Detection framework with Single-modality Annotations, called DOD-SA. The architecture of DOD-SA is built upon a Single- and Dual-Modality Collaborative Teacher-Student Network (CoSD-TSNet), which consists of a single-modality branch (SM-Branch) and a dual-modality decoupled branch (DMD-Branch). The teacher model generates pseudo-labels for the unlabeled modality, simultaneously supporting the training of the student model. The collaborative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.