Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios

Mingwei Tang; Jiahao Nie; Guang Yang; Ziqing Cui; Jie Li

arXiv:2512.20556·cs.CV·December 24, 2025

Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios

Mingwei Tang, Jiahao Nie, Guang Yang, Ziqing Cui, Jie Li

PDF

Open Access

TL;DR

This paper introduces a multi-grained text-guided image fusion method that leverages hierarchical textual descriptions and saliency-driven modules to improve fusion quality in challenging multi-exposure and multi-focus scenarios.

Contribution

It proposes a novel multi-grained textual guidance framework with hierarchical cross-modal modulation and saliency enrichment to enhance image fusion performance.

Findings

01

Outperforms previous methods on multi-exposure fusion tasks.

02

Effectively aligns visual and textual features at multiple granularities.

03

Enhances fusion quality with dense semantic content augmentation.

Abstract

Image fusion aims to synthesize a single high-quality image from a pair of inputs captured under challenging conditions, such as differing exposure levels or focal depths. A core challenge lies in effectively handling disparities in dynamic range and focus depth between the inputs. With the advent of vision-language models, recent methods incorporate textual descriptions as auxiliary guidance to enhance fusion quality. However, simply incorporating coarse-grained descriptions hampers the understanding of fine-grained details and poses challenges for precise cross-modal alignment. To address these limitations, we propose Multi-grained Text-guided Image Fusion (MTIF), a novel fusion paradigm with three key designs. First, it introduces multi-grained textual descriptions that separately capture fine details, structural cues, and semantic content, guiding image fusion through a hierarchical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Fusion Techniques · Image Enhancement Techniques · Visual Attention and Saliency Detection