Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2
Ange Lou, Yamin Li, Yike Zhang, Robert F. Labadie, Jack Noble

TL;DR
This paper evaluates the zero-shot surgical tool segmentation capabilities of the Segment Anything Model 2 (SAM 2) in diverse surgical videos, highlighting its strengths and limitations in real-world surgical scenarios.
Contribution
The study demonstrates SAM 2's effectiveness in zero-shot surgical tool segmentation and identifies challenges specific to surgical video analysis.
Findings
SAM 2 performs well across different surgical videos
Additional prompts improve segmentation accuracy for new tools
Surgical video challenges affect SAM 2's robustness
Abstract
The Segment Anything Model 2 (SAM 2) is the latest generation foundation model for image and video segmentation. Trained on the expansive Segment Anything Video (SA-V) dataset, which comprises 35.5 million masks across 50.9K videos, SAM 2 advances its predecessor's capabilities by supporting zero-shot segmentation through various prompts (e.g., points, boxes, and masks). Its robust zero-shot performance and efficient memory usage make SAM 2 particularly appealing for surgical tool segmentation in videos, especially given the scarcity of labeled data and the diversity of surgical procedures. In this study, we evaluate the zero-shot video segmentation performance of the SAM 2 model across different types of surgeries, including endoscopy and microscopy. We also assess its performance on videos featuring single and multiple tools of varying lengths to demonstrate SAM 2's applicability and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Imaging in Medicine · Surgical Simulation and Training
MethodsSegment Anything Model
