Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion

Wenyu Shao; Hongbo Liu; Yunchuan Ma; Ruili Wang

arXiv:2601.01870·cs.CV·January 6, 2026

Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion

Wenyu Shao, Hongbo Liu, Yunchuan Ma, Ruili Wang

PDF

Open Access

TL;DR

This paper introduces EGMT, a novel multi-task learning framework that leverages entity-level textual information and cross-modal interactions to significantly improve infrared and visible image fusion quality.

Contribution

The paper proposes a new entity-guided multi-task learning approach that extracts semantic-rich textual information and enhances feature interaction for better image fusion results.

Findings

01

EGMT outperforms state-of-the-art methods in preserving salient targets.

02

It improves texture detail and semantic consistency in fused images.

03

The approach effectively utilizes entity-level textual supervision.

Abstract

Existing text-driven infrared and visible image fusion approaches often rely on textual information at the sentence level, which can lead to semantic noise from redundant text and fail to fully exploit the deeper semantic value of textual information. To address these issues, we propose a novel fusion approach named Entity-Guided Multi-Task learning for infrared and visible image fusion (EGMT). Our approach includes three key innovative components: (i) A principled method is proposed to extract entity-level textual information from image captions generated by large vision-language models, eliminating semantic noise from raw text while preserving critical semantic information; (ii) A parallel multi-task learning architecture is constructed, which integrates image fusion with a multi-label classification task. By using entities as pseudo-labels, the multi-label classification task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Fusion Techniques · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis