Representation Space Constrained Learning with Modality Decoupling for Multimodal Object Detection

YiKang Shao; Tao Shi

arXiv:2511.15433·cs.CV·November 20, 2025

Representation Space Constrained Learning with Modality Decoupling for Multimodal Object Detection

YiKang Shao, Tao Shi

PDF

Open Access

TL;DR

This paper offers a theoretical analysis of fusion degradation in multimodal object detection and proposes a novel method, RSC-MD, to improve modality-specific backbone optimization, leading to state-of-the-art results.

Contribution

It introduces a theoretical framework for understanding fusion degradation and proposes RSC-MD, a method that enhances modality learning by addressing gradient suppression and imbalance.

Findings

01

RSC-MD effectively alleviates fusion degradation.

02

The method achieves state-of-the-art performance on multiple datasets.

03

Extensive experiments validate the approach's effectiveness.

Abstract

Multimodal object detection has attracted significant attention in both academia and industry for its enhanced robustness. Although numerous studies have focused on improving modality fusion strategies, most neglect fusion degradation, and none provide a theoretical analysis of its underlying causes. To fill this gap, this paper presents a systematic theoretical investigation of fusion degradation in multimodal detection and identifies two key optimization deficiencies: (1) the gradients of unimodal branch backbones are severely suppressed under multimodal architectures, resulting in under-optimization of the unimodal branches; (2) disparities in modality quality cause weaker modalities to experience stronger gradient suppression, which in turn results in imbalanced modality learning. To address these issues, this paper proposes a Representation Space Constrained Learning with Modality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Infrared Target Detection Methodologies