SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking

Haofeng Liu; Ziyue Wang; Sudhanshu Mishra; Mingqi Gao; Guanyi Qin; Chang Han Low; Alex Y. W. Kong; Yueming Jin

arXiv:2511.16618·cs.CV·November 21, 2025

SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking

Haofeng Liu, Ziyue Wang, Sudhanshu Mishra, Mingqi Gao, Guanyi Qin, Chang Han Low, Alex Y. W. Kong, Yueming Jin

PDF

Open Access

TL;DR

This paper introduces SAM2S, a foundation model for surgical video segmentation that leverages a new benchmark and novel memory and learning mechanisms to improve long-term tracking and zero-shot generalization in surgical scenarios.

Contribution

The paper presents SAM2S, a novel foundation model for surgical iVOS, built upon a large surgical benchmark and incorporating DiveMem, temporal semantic learning, and ambiguity-resilient training.

Findings

01

SAM2 improves by 12.99 points over vanilla SAM2.

02

SAM2S achieves 80.42 average J&F, surpassing baselines.

03

Model runs at 68 FPS with strong zero-shot generalization.

Abstract

Surgical video segmentation is crucial for computer-assisted surgery, enabling precise localization and tracking of instruments and tissues. Interactive Video Object Segmentation (iVOS) models such as Segment Anything Model 2 (SAM2) provide prompt-based flexibility beyond methods with predefined categories, but face challenges in surgical scenarios due to the domain gap and limited long-term tracking. To address these limitations, we construct SA-SV, the largest surgical iVOS benchmark with instance-level spatio-temporal annotations (masklets) spanning eight procedure types (61k frames, 1.6k masklets), enabling comprehensive development and evaluation for long-term tracking and zero-shot generalization. Building on SA-SV, we propose SAM2S, a foundation model enhancing \textbf{SAM2} for \textbf{S}urgical iVOS through: (1) DiveMem, a trainable diverse memory mechanism for robust long-term…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Advanced Neural Network Applications · Multimodal Machine Learning Applications