# CAFusion: A progressive ConvMixer network for context-aware infrared and visible image fusion

**Authors:** Hafiz Tayyab Mustafa, Hamza Mustafa, Hassan Alhuzali, Mujtaba Asad, Zhonglong Zheng

PMC · DOI: 10.1371/journal.pone.0339828 · PLOS One · 2026-01-08

## TL;DR

This paper introduces CAFusion, a new deep learning framework for combining visible and infrared images efficiently and effectively.

## Contribution

The novel context-aware ConvMixer block and intermodality fusion strategy improve image fusion quality and efficiency.

## Key findings

- CAFusion outperforms state-of-the-art methods in fusion quality and computational efficiency.
- On the TNO dataset, CAFusion achieves a 0.769 SSIM score, a 2.07% improvement over the best competitor.
- The proposed method preserves both low and high-level image details through a hierarchical multiscale decoder.

## Abstract

Image fusion is a challenging task that aims to generate a composite image by combining information from diverse sources. While deep learning (DL) algorithms have achieved promising results, most rely on complex encoders or attention mechanisms, leading to high computational cost and potential information loss during one-step feature fusion. We introduce CAFusion, a DL framework for visible (VI) and infrared (IR) image fusion. In particular, we propose a context-aware ConvMixer block that uniquely integrates dilated convolutions for expanded receptive fields with depthwise separable convolutions for parameter efficiency. Unlike existing CNN or transformer-based modules, our block captures multi-scale contextual information without attention mechanisms, with computational efficiency. Additionally, we employ an attention-based intermodality multi-level progressive fusion strategy, ensuring an adaptive combination of multi-scale modality-specific features. A hierarchical multiscale decoder reconstructs the fused image by aggregating information across different levels, preserving low and high-level details. Comparative evaluations of benchmark datasets demonstrate that CAFusion outperforms recent transformer-based and SOTA DL-based approaches in fusion quality and computational efficiency. In particular, on the TNO benchmark dataset, CAFusion achieves a 0.769 score in the structural similarity index measure, a 2.07 percent increase as compared to the best competing method.

## Full-text entities

- **Genes:** VIT (vitrin) [NCBI Gene 5212] {aka VIT1}, INSR (insulin receptor) [NCBI Gene 3643] {aka CD220, HHF5}
- **Diseases:** attention block (MESH:D001289), HCB block (MESH:D006327), DL (MESH:D007859), IFCNN (MESH:D029424)
- **Chemicals:** DilConv (-), Cr (MESH:D002857)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12782428/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12782428/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12782428/full.md

---
Source: https://tomesphere.com/paper/PMC12782428