TeSG: Textual Semantic Guidance for Infrared and Visible Image Fusion

Mingrui Zhu; Xiru Chen; Xin Wei; Nannan Wang; Xinbo Gao

arXiv:2506.16730·cs.CV·June 23, 2025

TeSG: Textual Semantic Guidance for Infrared and Visible Image Fusion

Mingrui Zhu, Xiru Chen, Xin Wei, Nannan Wang, Xinbo Gao

PDF

Open Access

TL;DR

TeSG introduces a novel text-guided image fusion framework that leverages large vision-language models to incorporate textual semantics at multiple levels, significantly improving the quality and utility of fused infrared and visible images for downstream tasks.

Contribution

The paper presents a new method, TeSG, which effectively integrates textual semantic information into infrared and visible image fusion using a multi-level guidance approach with novel modules.

Findings

01

TeSG outperforms existing methods in downstream detection and segmentation tasks.

02

The proposed modules improve the fusion quality by incorporating textual semantics.

03

TeSG demonstrates robustness and versatility across various datasets and scenarios.

Abstract

Infrared and visible image fusion (IVF) aims to combine complementary information from both image modalities, producing more informative and comprehensive outputs. Recently, text-guided IVF has shown great potential due to its flexibility and versatility. However, the effective integration and utilization of textual semantic information remains insufficiently studied. To tackle these challenges, we introduce textual semantics at two levels: the mask semantic level and the text semantic level, both derived from textual descriptions extracted by large Vision-Language Models (VLMs). Building on this, we propose Textual Semantic Guidance for infrared and visible image fusion, termed TeSG, which guides the image synthesis process in a way that is optimized for downstream tasks such as detection and segmentation. Specifically, TeSG consists of three core components: a Semantic Information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Fusion Techniques · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications