Bridging the RGB-IR Gap: Consensus and Discrepancy Modeling for Text-Guided Multispectral Detection

Jiaqi Wu; Zhen Wang; Enhao Huang; Kangqing Shen; Yulin Wang; Yang Yue; Yifan Pu; Gao Huang

arXiv:2604.11234·cs.CV·April 14, 2026

Bridging the RGB-IR Gap: Consensus and Discrepancy Modeling for Text-Guided Multispectral Detection

Jiaqi Wu, Zhen Wang, Enhao Huang, Kangqing Shen, Yulin Wang, Yang Yue, Yifan Pu, Gao Huang

PDF

1 Repo

TL;DR

This paper introduces a novel semantic bridge fusion framework for multispectral object detection that leverages text semantics to better align RGB and IR modalities, addressing granularity and discrepancy issues.

Contribution

It proposes a bi-support modeling approach using text as a shared semantic bridge and introduces a structured fusion method that incorporates consensus and discrepancy supports.

Findings

01

Achieves superior detection performance on multispectral benchmarks.

02

Effectively aligns RGB and IR responses using text-guided semantic mapping.

03

Demonstrates the benefits of modeling cross-modal discrepancies in fusion.

Abstract

Text-guided multispectral object detection uses text semantics to guide semantic-aware cross-modal interaction between RGB and IR for more robust perception. However, notable limitations remain: (1) existing methods often use text only as an auxiliary semantic enhancement signal, without exploiting its guiding role to bridge the inherent granularity asymmetry between RGB and IR; and (2) conventional data-driven attention-based fusion tends to emphasize stable consensus while overlooking potentially valuable cross-modal discrepancies. To address these issues, we propose a semantic bridge fusion framework with bi-support modeling for multispectral object detection. Specifically, text is used as a shared semantic bridge to align RGB and IR responses under a unified category condition, while the recalibrated thermal semantic prior is projected onto the RGB branch for semantic-level mapping…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhenwang5372/Bridging-RGB-IR-Gap
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.