CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation
Xinlei Yu, Changmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge

TL;DR
CRISP-SAM2 is a novel multi-organ medical segmentation model that leverages cross-modal interaction and semantic prompting to improve detail accuracy and reduce reliance on geometric prompts, outperforming existing models.
Contribution
The paper introduces CRISP-SAM2, a new model that integrates cross-modal semantics and semantic prompting to enhance multi-organ segmentation in medical images.
Findings
Outperforms existing models on seven public datasets.
Effectively incorporates cross-modal interaction for better segmentation.
Reduces dependence on geometric prompts through semantic prompting.
Abstract
Multi-organ medical segmentation is a crucial component of medical image processing, essential for doctors to make accurate diagnoses and develop effective treatment plans. Despite significant progress in this field, current multi-organ segmentation models often suffer from inaccurate details, dependence on geometric prompts and loss of spatial information. Addressing these challenges, we introduce a novel model named CRISP-SAM2 with CRoss-modal Interaction and Semantic Prompting based on SAM2. This model represents a promising approach to multi-organ medical segmentation guided by textual descriptions of organs. Our method begins by converting visual and textual inputs into cross-modal contextualized semantics using a progressive cross-attention interaction mechanism. These semantics are then injected into the image encoder to enhance the detailed understanding of visual information.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
