RefSAM3D: Adapting SAM with Cross-modal Reference for 3D Medical Image   Segmentation

Xiang Gao; Kai Lu

arXiv:2412.05605·cs.CV·December 10, 2024

RefSAM3D: Adapting SAM with Cross-modal Reference for 3D Medical Image Segmentation

Xiang Gao, Kai Lu

PDF

Open Access

TL;DR

RefSAM3D extends the Segment Anything Model to 3D medical images by incorporating a 3D adapter, cross-modal prompts, and hierarchical attention, significantly improving segmentation accuracy in volumetric medical data.

Contribution

This work introduces a novel adaptation of SAM for 3D medical imaging, including a 3D image adapter and cross-modal prompt generation, enabling effective segmentation of complex anatomical structures.

Findings

01

Outperforms state-of-the-art segmentation methods on multiple datasets.

02

Effectively captures multi-scale information with hierarchical attention.

03

Achieves higher accuracy and consistency in 3D medical image segmentation.

Abstract

The Segment Anything Model (SAM), originally built on a 2D Vision Transformer (ViT), excels at capturing global patterns in 2D natural images but struggles with 3D medical imaging modalities like CT and MRI. These modalities require capturing spatial information in volumetric space for tasks such as organ segmentation and tumor quantification. To address this challenge, we introduce RefSAM3D, which adapts SAM for 3D medical imaging by incorporating a 3D image adapter and cross-modal reference prompt generation. Our approach modifies the visual encoder to handle 3D inputs and enhances the mask decoder for direct 3D mask generation. We also integrate textual prompts to improve segmentation accuracy and consistency in complex anatomical scenarios. By employing a hierarchical attention mechanism, our model effectively captures and integrates information across different scales. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Advanced Neural Network Applications

MethodsAttention Is All You Need · Adam · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing