SASVi -- Segment Any Surgical Video
Ssharvien Kumar Sivakumar, Yannik Frisch, Amin Ranem, Anirban Mukhopadhyay

TL;DR
SASVi introduces a re-prompting mechanism using a Mask R-CNN Overseer to improve temporal consistency in surgical video segmentation, enabling effective deployment of foundation models with minimal annotations.
Contribution
The paper presents a novel re-prompting approach with an Overseer model that enhances temporal segmentation consistency in surgical videos using limited annotated data.
Findings
Significant improvement in temporal consistency over existing methods.
Successful application of SAM2 to various surgical datasets.
Public release of extensive surgical video annotations.
Abstract
Purpose: Foundation models, trained on multitudes of public datasets, often require additional fine-tuning or re-prompting mechanisms to be applied to visually distinct target domains such as surgical videos. Further, without domain knowledge, they cannot model the specific semantics of the target domain. Hence, when applied to surgical video segmentation, they fail to generalise to sections where previously tracked objects leave the scene or new objects enter. Methods: We propose SASVi, a novel re-prompting mechanism based on a frame-wise Mask R-CNN Overseer model, which is trained on a minimal amount of scarcely available annotations for the target domain. This model automatically re-prompts the foundation model SAM2 when the scene constellation changes, allowing for temporally smooth and complete segmentation of full surgical videos. Results: Re-prompting based on our Overseer model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAortic Thrombus and Embolism · Venous Thromboembolism Diagnosis and Management
MethodsRegion Proposal Network · Convolution · RoIAlign · Softmax · Mask R-CNN
