# TVNet: Multimodal medical image fusion by dual-branch network with vision transformer and one-shot aggregation

**Authors:** Jianguo Wang, Wenran Jia, Yuhang Liu, Pengfei Wu, Peng Geng, Xuguang Meng

PMC · DOI: 10.1177/00368504251375188 · Science Progress · 2025-11-04

## TL;DR

TVNet is a new method for combining medical images using a dual-branch network with Vision Transformer to better preserve details and structure.

## Contribution

TVNet introduces a dual-branch network combining Vision Transformer and CNN for improved multimodal medical image fusion.

## Key findings

- TVNet outperforms seven state-of-the-art methods in image fusion quality.
- The network preserves detailed textures and structural features in fused images.
- A hybrid loss function optimizes fusion results at multiple levels.

## Abstract

The task of medical image fusion involves synthesizing complementary information from different modal medical images, which is of very significant for clinical diagnosis. The existing medical image fusion algorithms overly rely on convolution operations and cannot establish long-range dependencies on the source images. This can lead to edge blurring and loss of details in the fused images. Because the Transformer can effectively model long-range dependencies through self-attention, a novel and effective dual-branch feature enhancement network called TVNet is proposed to fuse multimodal medical images. This network combines Vision Transformer and Convolutional Neural Network to extract global context information and local information to preserve detailed textures and highlight the structural characteristics in source images. Furthermore, to extract the multiscale information of images, an enhancement module is used to obtain multiscale characterization information, and the two branches information are efficiently aggregated at the same time. In addition, a hybrid loss function is designed to optimize the fusion results at three levels of structure, feature, and gradient. Experiment results prove that the performance of the proposed fusion network outperforms seven state-of-the-art methods in both subjective visual effects and objective metrics. Our code is available at https://github.com/sineagles/TVNet.

## Full-text entities

- **Chemicals:** TVNet (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12586861/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12586861/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC12586861/full.md

---
Source: https://tomesphere.com/paper/PMC12586861