RGBX-DiffusionDet: A Framework for Multi-Modal RGB-X Object Detection Using DiffusionDet
Eliraz Orfaig, Inna Stainvas, Igal Bilik

TL;DR
RGBX-DiffusionDet is a novel multimodal object detection framework that effectively fuses heterogeneous 2D data with RGB images using adaptive attention mechanisms and multi-scale feature aggregation, outperforming RGB-only baselines.
Contribution
The paper introduces a new multimodal object detection framework with adaptive fusion modules and regularization losses, advancing the integration of diverse 2D sensing modalities into diffusion-based models.
Findings
Outperforms baseline RGB-only DiffusionDet on multiple datasets
Maintains decoding efficiency despite added multimodal modules
Provides new insights into multimodal data fusion for object detection
Abstract
This work introduces RGBX-DiffusionDet, an object detection framework extending the DiffusionDet model to fuse the heterogeneous 2D data (X) with RGB imagery via an adaptive multimodal encoder. To enable cross-modal interaction, we design the dynamic channel reduction within a convolutional block attention module (DCR-CBAM), which facilitates cross-talk between subnetworks by dynamically highlighting salient channel features. Furthermore, the dynamic multi-level aggregation block (DMLAB) is proposed to refine spatial feature representations through adaptive multiscale fusion. Finally, novel regularization losses that enforce channel saliency and spatial selectivity are introduced, leading to compact and discriminative feature embeddings. Extensive experiments using RGB-Depth (KITTI), a novel annotated RGB-Polarimetric dataset, and RGB-Infrared (MFD) benchmark dataset were conducted.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods
MethodsSoftmax · Attention Is All You Need
