TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion
Chunyang Cheng, Tianyang Xu, Xiao-Jun Wu, Hui Li, Xi Li, Zhangyong, Tang, Josef Kittler

TL;DR
TextFusion leverages textual semantics and a vision-language model to enable controllable, high-quality image fusion, improving over traditional methods by integrating higher-level semantic guidance for various downstream applications.
Contribution
The paper introduces a novel text-guided image fusion framework using a vision-language model and a coarse-to-fine association mechanism, along with a new dataset for the task.
Findings
Outperforms traditional appearance-based fusion methods
Effectively incorporates textual semantics for controllable fusion
Demonstrates robustness across different fusion scenarios
Abstract
Advanced image fusion methods are devoted to generating the fusion results by aggregating the complementary information conveyed by the source images. However, the difference in the source-specific manifestation of the imaged scene content makes it difficult to design a robust and controllable fusion process. We argue that this issue can be alleviated with the help of higher-level semantics, conveyed by the text modality, which should enable us to generate fused images for different purposes, such as visualisation and downstream tasks, in a controllable way. This is achieved by exploiting a vision-and-language model to build a coarse-to-fine association mechanism between the text and image signals. With the guidance of the association maps, an affine fusion unit is embedded in the transformer network to fuse the text and vision modalities at the feature level. As another ingredient of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Fusion Techniques · Visual Attention and Saliency Detection · Image Retrieval and Classification Techniques
