CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

Xinlei Yu; Changmiao Wang; Hui Jin; Ahmed Elazab; Gangyong Jia; Xiang Wan; Changqing Zou; Ruiquan Ge

arXiv:2506.23121·eess.IV·July 15, 2025

CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

Xinlei Yu, Changmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge

PDF

Open Access 1 Repo

TL;DR

CRISP-SAM2 is a novel multi-organ medical segmentation model that leverages cross-modal interaction and semantic prompting to improve detail accuracy and reduce reliance on geometric prompts, outperforming existing models.

Contribution

The paper introduces CRISP-SAM2, a new model that integrates cross-modal semantics and semantic prompting to enhance multi-organ segmentation in medical images.

Findings

01

Outperforms existing models on seven public datasets.

02

Effectively incorporates cross-modal interaction for better segmentation.

03

Reduces dependence on geometric prompts through semantic prompting.

Abstract

Multi-organ medical segmentation is a crucial component of medical image processing, essential for doctors to make accurate diagnoses and develop effective treatment plans. Despite significant progress in this field, current multi-organ segmentation models often suffer from inaccurate details, dependence on geometric prompts and loss of spatial information. Addressing these challenges, we introduce a novel model named CRISP-SAM2 with CRoss-modal Interaction and Semantic Prompting based on SAM2. This model represents a promising approach to multi-organ medical segmentation guided by textual descriptions of organs. Our method begins by converting visual and textual inputs into cross-modal contextualized semantics using a progressive cross-attention interaction mechanism. These semantics are then injected into the image encoder to enhance the detailed understanding of visual information.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yu-deep/crisp_sam2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning