On Modality Incomplete Infrared-Visible Object Detection: An Architecture Compatibility Perspective

Shuo Yang; Yinghui Xing; Shizhou Zhang; Zhilong Niu

arXiv:2511.06406·cs.CV·November 11, 2025

On Modality Incomplete Infrared-Visible Object Detection: An Architecture Compatibility Perspective

Shuo Yang, Yinghui Xing, Shizhou Zhang, Zhilong Niu

PDF

Open Access 1 Video

TL;DR

This paper introduces a flexible IVOD detection architecture with a new module and training strategy, improving robustness to missing modalities and establishing a comprehensive benchmark for incomplete modality scenarios.

Contribution

We propose a plug-and-play Scarf Neck module with a modality-agnostic attention mechanism and a pseudo modality dropout strategy, enhancing IVOD model adaptability to incomplete modalities.

Findings

01

Scarf-DETR outperforms existing models in missing modality scenarios.

02

The proposed benchmark thoroughly evaluates modality-incomplete IVOD.

03

Scarf-DETR achieves superior results on standard IVOD benchmarks.

Abstract

Infrared and visible object detection (IVOD) is essential for numerous around-the-clock applications. Despite notable advancements, current IVOD models exhibit notable performance declines when confronted with incomplete modality data, particularly if the dominant modality is missing. In this paper, we take a thorough investigation on modality incomplete IVOD problem from an architecture compatibility perspective. Specifically, we propose a plug-and-play Scarf Neck module for DETR variants, which introduces a modality-agnostic deformable attention mechanism to enable the IVOD detector to flexibly adapt to any single or double modalities during training and inference. When training Scarf-DETR, we design a pseudo modality dropout strategy to fully utilize the multi-modality information, making the detector compatible and robust to both working modes of single and double modalities.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On Modality Incomplete Infrared-Visible Object Detection: An Architecture Compatibility Perspective· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Optical Sensing Technologies · CCD and CMOS Imaging Sensors