TextFusion: Unveiling the Power of Textual Semantics for Controllable   Image Fusion

Chunyang Cheng; Tianyang Xu; Xiao-Jun Wu; Hui Li; Xi Li; Zhangyong; Tang; Josef Kittler

arXiv:2312.14209·cs.CV·February 9, 2024·1 cites

TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion

Chunyang Cheng, Tianyang Xu, Xiao-Jun Wu, Hui Li, Xi Li, Zhangyong, Tang, Josef Kittler

PDF

Open Access 1 Repo

TL;DR

TextFusion leverages textual semantics and a vision-language model to enable controllable, high-quality image fusion, improving over traditional methods by integrating higher-level semantic guidance for various downstream applications.

Contribution

The paper introduces a novel text-guided image fusion framework using a vision-language model and a coarse-to-fine association mechanism, along with a new dataset for the task.

Findings

01

Outperforms traditional appearance-based fusion methods

02

Effectively incorporates textual semantics for controllable fusion

03

Demonstrates robustness across different fusion scenarios

Abstract

Advanced image fusion methods are devoted to generating the fusion results by aggregating the complementary information conveyed by the source images. However, the difference in the source-specific manifestation of the imaged scene content makes it difficult to design a robust and controllable fusion process. We argue that this issue can be alleviated with the help of higher-level semantics, conveyed by the text modality, which should enable us to generate fused images for different purposes, such as visualisation and downstream tasks, in a controllable way. This is achieved by exploiting a vision-and-language model to build a coarse-to-fine association mechanism between the text and image signals. With the guidance of the association maps, an affine fusion unit is embedded in the transformer network to fuse the text and vision modalities at the feature level. As another ingredient of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

awcxv/textfusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Fusion Techniques · Visual Attention and Saliency Detection · Image Retrieval and Classification Techniques