On Modality Incomplete Infrared-Visible Object Detection: An Architecture Compatibility Perspective
Shuo Yang, Yinghui Xing, Shizhou Zhang, Zhilong Niu

TL;DR
This paper introduces a flexible IVOD detection architecture with a new module and training strategy, improving robustness to missing modalities and establishing a comprehensive benchmark for incomplete modality scenarios.
Contribution
We propose a plug-and-play Scarf Neck module with a modality-agnostic attention mechanism and a pseudo modality dropout strategy, enhancing IVOD model adaptability to incomplete modalities.
Findings
Scarf-DETR outperforms existing models in missing modality scenarios.
The proposed benchmark thoroughly evaluates modality-incomplete IVOD.
Scarf-DETR achieves superior results on standard IVOD benchmarks.
Abstract
Infrared and visible object detection (IVOD) is essential for numerous around-the-clock applications. Despite notable advancements, current IVOD models exhibit notable performance declines when confronted with incomplete modality data, particularly if the dominant modality is missing. In this paper, we take a thorough investigation on modality incomplete IVOD problem from an architecture compatibility perspective. Specifically, we propose a plug-and-play Scarf Neck module for DETR variants, which introduces a modality-agnostic deformable attention mechanism to enable the IVOD detector to flexibly adapt to any single or double modalities during training and inference. When training Scarf-DETR, we design a pseudo modality dropout strategy to fully utilize the multi-modality information, making the detector compatible and robust to both working modes of single and double modalities.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Optical Sensing Technologies · CCD and CMOS Imaging Sensors
