RefSAM3D: Adapting SAM with Cross-modal Reference for 3D Medical Image Segmentation
Xiang Gao, Kai Lu

TL;DR
RefSAM3D extends the Segment Anything Model to 3D medical images by incorporating a 3D adapter, cross-modal prompts, and hierarchical attention, significantly improving segmentation accuracy in volumetric medical data.
Contribution
This work introduces a novel adaptation of SAM for 3D medical imaging, including a 3D image adapter and cross-modal prompt generation, enabling effective segmentation of complex anatomical structures.
Findings
Outperforms state-of-the-art segmentation methods on multiple datasets.
Effectively captures multi-scale information with hierarchical attention.
Achieves higher accuracy and consistency in 3D medical image segmentation.
Abstract
The Segment Anything Model (SAM), originally built on a 2D Vision Transformer (ViT), excels at capturing global patterns in 2D natural images but struggles with 3D medical imaging modalities like CT and MRI. These modalities require capturing spatial information in volumetric space for tasks such as organ segmentation and tumor quantification. To address this challenge, we introduce RefSAM3D, which adapts SAM for 3D medical imaging by incorporating a 3D image adapter and cross-modal reference prompt generation. Our approach modifies the visual encoder to handle 3D inputs and enhances the mask decoder for direct 3D mask generation. We also integrate textual prompts to improve segmentation accuracy and consistency in complex anatomical scenarios. By employing a hierarchical attention mechanism, our model effectively captures and integrates information across different scales. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Advanced Neural Network Applications
MethodsAttention Is All You Need · Adam · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing
