Representation Space Constrained Learning with Modality Decoupling for Multimodal Object Detection
YiKang Shao, Tao Shi

TL;DR
This paper offers a theoretical analysis of fusion degradation in multimodal object detection and proposes a novel method, RSC-MD, to improve modality-specific backbone optimization, leading to state-of-the-art results.
Contribution
It introduces a theoretical framework for understanding fusion degradation and proposes RSC-MD, a method that enhances modality learning by addressing gradient suppression and imbalance.
Findings
RSC-MD effectively alleviates fusion degradation.
The method achieves state-of-the-art performance on multiple datasets.
Extensive experiments validate the approach's effectiveness.
Abstract
Multimodal object detection has attracted significant attention in both academia and industry for its enhanced robustness. Although numerous studies have focused on improving modality fusion strategies, most neglect fusion degradation, and none provide a theoretical analysis of its underlying causes. To fill this gap, this paper presents a systematic theoretical investigation of fusion degradation in multimodal detection and identifies two key optimization deficiencies: (1) the gradients of unimodal branch backbones are severely suppressed under multimodal architectures, resulting in under-optimization of the unimodal branches; (2) disparities in modality quality cause weaker modalities to experience stronger gradient suppression, which in turn results in imbalanced modality learning. To address these issues, this paper proposes a Representation Space Constrained Learning with Modality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Infrared Target Detection Methodologies
