MedSAM-CA: A CNN-Augmented ViT with Attention-Enhanced Multi-Scale Fusion for Medical Image Segmentation

Peiting Tian; Xi Chen; Haixia Bi; Fan Li

arXiv:2506.23700·eess.IV·July 1, 2025

MedSAM-CA: A CNN-Augmented ViT with Attention-Enhanced Multi-Scale Fusion for Medical Image Segmentation

Peiting Tian, Xi Chen, Haixia Bi, Fan Li

PDF

TL;DR

MedSAM-CA enhances medical image segmentation by fine-tuning a pretrained foundation model with novel boundary refinement and multi-scale feature fusion components, achieving high accuracy even with limited training data.

Contribution

The paper introduces MedSAM-CA, a novel architecture that adapts a pretrained foundation model for medical segmentation, incorporating boundary refinement and multi-scale feature fusion to improve accuracy.

Findings

01

Achieves 94.43% Dice with only 2% training data on dermoscopy images.

02

Reaches 97.25% of full-data performance, demonstrating effectiveness in low-resource settings.

03

Validates approach across dermoscopy, CT, and MRI datasets.

Abstract

Medical image segmentation plays a crucial role in clinical diagnosis and treatment planning, where accurate boundary delineation is essential for precise lesion localization, organ identification, and quantitative assessment. In recent years, deep learning-based methods have significantly advanced segmentation accuracy. However, two major challenges remain. First, the performance of these methods heavily relies on large-scale annotated datasets, which are often difficult to obtain in medical scenarios due to privacy concerns and high annotation costs. Second, clinically challenging scenarios, such as low contrast in certain imaging modalities and blurry lesion boundaries caused by malignancy, still pose obstacles to precise segmentation. To address these challenges, we propose MedSAM-CA, an architecture-level fine-tuning approach that mitigates reliance on extensive manual annotations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.