Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic Surgery
Yuyang Sheng, Sophia Bano, Matthew J. Clarkson, Mobarakol, Islam

TL;DR
Surgical-DeSAM introduces an automatic prompt generation method using DETR and Swin-transformer to enable real-time, prompt-free instrument segmentation in robotic surgery, outperforming existing methods.
Contribution
It presents a novel decoupling approach for SAM using detection architecture and transformers to achieve prompt-free, real-time surgical instrument segmentation.
Findings
Achieved dice scores of 89.62 and 90.70 on EndoVis 2017 and 2018 datasets.
Outperformed state-of-the-art segmentation methods.
Enabled prompt-free, real-time segmentation in surgical applications.
Abstract
Purpose: The recent Segment Anything Model (SAM) has demonstrated impressive performance with point, text or bounding box prompts, in various applications. However, in safety-critical surgical tasks, prompting is not possible due to (i) the lack of per-frame prompts for supervised learning, (ii) it is unrealistic to prompt frame-by-frame in a real-time tracking application, and (iii) it is expensive to annotate prompts for offline applications. Methods: We develop Surgical-DeSAM to generate automatic bounding box prompts for decoupling SAM to obtain instrument segmentation in real-time robotic surgery. We utilise a commonly used detection architecture, DETR, and fine-tuned it to obtain bounding box prompt for the instruments. We then empolyed decoupling SAM (DeSAM) by replacing the image encoder with DETR encoder and fine-tune prompt encoder and mask decoder to obtain instance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoft Robotics and Applications · Advanced X-ray and CT Imaging · Anatomy and Medical Technology
MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Feedforward Network · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax
