CC-SAM: SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation
Shreyank N Gowda, David A. Clifton

TL;DR
This paper enhances the Segment Anything Model (SAM) for ultrasound image segmentation by integrating a CNN branch, variational attention, and ChatGPT-generated prompts, significantly improving its performance on complex medical images.
Contribution
The paper introduces a novel fusion module, feature and position adapters, and utilizes ChatGPT for prompt generation to adapt SAM for medical imaging tasks.
Findings
Improved segmentation accuracy on ultrasound images
Effective integration of CNN and ViT encoders
ChatGPT-generated prompts enhance model understanding
Abstract
The Segment Anything Model (SAM) has achieved remarkable successes in the realm of natural image segmentation, but its deployment in the medical imaging sphere has encountered challenges. Specifically, the model struggles with medical images that feature low contrast, faint boundaries, intricate morphologies, and small-sized objects. To address these challenges and enhance SAM's performance in the medical domain, we introduce a comprehensive modification. Firstly, we incorporate a frozen Convolutional Neural Network (CNN) branch as an image encoder, which synergizes with SAM's original Vision Transformer (ViT) encoder through a novel variational attention fusion module. This integration bolsters the model's capability to capture local spatial information, which is often paramount in medical imagery. Moreover, to further optimize SAM for medical imaging, we introduce feature and position…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Advanced Neural Network Applications · Medical Imaging and Analysis
MethodsLinear Layer · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Vision Transformer
