Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion

Caixia Dong; Duwei Dai; Xinyi Han; Fan Liu; Xu Yang; Zongfang Li; Songhua Xu

arXiv:2507.12938·eess.IV·July 18, 2025

Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion

Caixia Dong, Duwei Dai, Xinyi Han, Fan Liu, Xu Yang, Zongfang Li, Songhua Xu

PDF

Open Access

TL;DR

This paper introduces a novel coronary artery segmentation framework that combines vision transformer and CNN encoders with variational fusion and uncertainty refinement, significantly improving accuracy and generalization.

Contribution

It presents a new parallel ViT-CNN encoding architecture with variational fusion and an uncertainty refinement module for improved coronary artery segmentation.

Findings

01

Outperforms state-of-the-art methods on multiple datasets

02

Achieves higher segmentation accuracy and robustness

03

Demonstrates strong generalization across datasets

Abstract

Accurate coronary artery segmentation is critical for computeraided diagnosis of coronary artery disease (CAD), yet it remains challenging due to the small size, complex morphology, and low contrast with surrounding tissues. To address these challenges, we propose a novel segmentation framework that leverages the power of vision foundation models (VFMs) through a parallel encoding architecture. Specifically, a vision transformer (ViT) encoder within the VFM captures global structural features, enhanced by the activation of the final two ViT blocks and the integration of an attention-guided enhancement (AGE) module, while a convolutional neural network (CNN) encoder extracts local details. These complementary features are adaptively fused using a cross-branch variational fusion (CVF) module, which models latent distributions and applies variational attention to assign modality-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced X-ray and CT Imaging