TL;DR
This paper introduces CrossFuse, a novel cross attention mechanism designed to improve infrared and visible image fusion by emphasizing complementarity, along with a two-stage training scheme that achieves state-of-the-art fusion performance.
Contribution
The paper proposes a new cross attention mechanism tailored for image fusion that enhances complementary information, along with a two-stage training strategy for effective fusion.
Findings
Achieves state-of-the-art fusion performance on benchmark datasets.
Effectively enhances complementary information while reducing redundancy.
Demonstrates superior results compared to existing fusion methods.
Abstract
Multimodal visual information fusion aims to integrate the multi-sensor data into a single image which contains more complementary information and less redundant features. However the complementary information is hard to extract, especially for infrared and visible images which contain big similarity gap between these two modalities. The common cross attention modules only consider the correlation, on the contrary, image fusion tasks need focus on complementarity (uncorrelation). Hence, in this paper, a novel cross attention mechanism (CAM) is proposed to enhance the complementary information. Furthermore, a two-stage training strategy based fusion scheme is presented to generate the fused images. For the first stage, two auto-encoder networks with same architecture are trained for each modality. Then, with the fixed encoders, the CAM and a decoder are trained in the second stage. With…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus · Class-activation map
