SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation
Jieming Yu, An Wang, Wenzhen Dong, Mengya Xu, Mobarakol Islam, Jie, Wang, Long Bai, Hongliang Ren

TL;DR
SAM 2 demonstrates strong zero-shot segmentation, robustness, and efficiency in robotic surgery video analysis, outperforming existing methods especially with bounding box prompts and showing resilience to real-world corruption.
Contribution
This study empirically evaluates SAM 2's zero-shot segmentation performance and robustness in robotic surgery, highlighting its advantages over prior models in surgical video segmentation tasks.
Findings
SAM 2 outperforms SOTA with bounding box prompts on MICCAI benchmarks.
Point prompts significantly improve SAM 2's segmentation accuracy.
SAM 2 shows enhanced robustness to image corruption and faster inference.
Abstract
The recent Segment Anything Model (SAM) 2 has demonstrated remarkable foundational competence in semantic segmentation, with its memory mechanism and mask decoder further addressing challenges in video tracking and object occlusion, thereby achieving superior results in interactive segmentation for both images and videos. Building upon our previous empirical studies, we further explore the zero-shot segmentation performance of SAM 2 in robot-assisted surgery based on prompts, alongside its robustness against real-world corruption. For static images, we employ two forms of prompts: 1-point and bounding box, while for video sequences, the 1-point prompt is applied to the initial frame. Through extensive experimentation on the MICCAI EndoVis 2017 and EndoVis 2018 benchmarks, SAM 2, when utilizing bounding box prompts, outperforms state-of-the-art (SOTA) methods in comparative evaluations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Medical Image Segmentation Techniques · Soft Robotics and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Segment Anything Model
