FuseFormer: A Transformer for Visual and Thermal Image Fusion
Aytekin Erdogan, Erdem Akag\"und\"uz

TL;DR
FuseFormer introduces a transformer-based multi-scale image fusion method that combines CNNs and Transformers, improving fusion quality by addressing limitations of traditional evaluation-metric-based loss functions.
Contribution
The paper proposes a novel two-stage training framework with a transformer-based fusion strategy and a new loss function to enhance visual and thermal image fusion performance.
Findings
Outperforms existing fusion algorithms on benchmark datasets
Effectively combines local and global features for improved fusion quality
Mitigates bias introduced by traditional evaluation-metric-based loss functions
Abstract
Due to the lack of a definitive ground truth for the image fusion problem, the loss functions are structured based on evaluation metrics, such as the structural similarity index measure (SSIM). However, in doing so, a bias is introduced toward the SSIM and, consequently, the input visual band image. The objective of this study is to propose a novel methodology for the image fusion problem that mitigates the limitations associated with using classical evaluation metrics as loss functions. Our approach integrates a transformer-based multi-scale fusion strategy that adeptly addresses local and global context information. This integration not only refines the individual components of the image fusion process but also significantly enhances the overall efficacy of the method. Our proposed method follows a two-stage training approach, where an auto-encoder is initially trained to extract deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Fusion Techniques · Image and Signal Denoising Methods · Infrared Thermography in Medicine
MethodsAttention Is All You Need · Layer Normalization · Absolute Position Encodings · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Residual Connection · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing
