Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion
Wenyu Shao, Hongbo Liu, Yunchuan Ma, Ruili Wang

TL;DR
This paper introduces EGMT, a novel multi-task learning framework that leverages entity-level textual information and cross-modal interactions to significantly improve infrared and visible image fusion quality.
Contribution
The paper proposes a new entity-guided multi-task learning approach that extracts semantic-rich textual information and enhances feature interaction for better image fusion results.
Findings
EGMT outperforms state-of-the-art methods in preserving salient targets.
It improves texture detail and semantic consistency in fused images.
The approach effectively utilizes entity-level textual supervision.
Abstract
Existing text-driven infrared and visible image fusion approaches often rely on textual information at the sentence level, which can lead to semantic noise from redundant text and fail to fully exploit the deeper semantic value of textual information. To address these issues, we propose a novel fusion approach named Entity-Guided Multi-Task learning for infrared and visible image fusion (EGMT). Our approach includes three key innovative components: (i) A principled method is proposed to extract entity-level textual information from image captions generated by large vision-language models, eliminating semantic noise from raw text while preserving critical semantic information; (ii) A parallel multi-task learning architecture is constructed, which integrates image fusion with a multi-label classification task. By using entities as pseudo-labels, the multi-label classification task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Fusion Techniques · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
