ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking

Haofeng Liu; Mingqi Gao; Xuxiao Luo; Ziyue Wang; Guanyi Qin; Junde Wu; Yueming Jin

arXiv:2505.08581·cs.CV·May 14, 2025

ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking

Haofeng Liu, Mingqi Gao, Xuxiao Luo, Ziyue Wang, Guanyi Qin, Junde Wu, Yueming Jin

PDF

1 Repo

TL;DR

ReSurgSAM2 is a novel surgical video segmentation framework that combines advanced detection and long-term tracking, significantly improving accuracy and efficiency for real-time surgical scene analysis.

Contribution

It introduces a two-stage framework with a new detection method and a diversity-driven memory for reliable long-term tracking in surgical videos.

Findings

01

Achieves real-time performance at 61.2 FPS.

02

Significantly improves segmentation accuracy over existing methods.

03

Demonstrates robustness in complex surgical scenarios.

Abstract

Surgical scene segmentation is critical in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, referring surgical segmentation is emerging, given its advantage of providing surgeons with an interactive experience to segment the target object. However, existing methods are limited by low efficiency and short-term tracking, hindering their applicability in complex real-world surgical scenarios. In this paper, we introduce ReSurgSAM2, a two-stage surgical referring segmentation framework that leverages Segment Anything Model 2 to perform text-referred target detection, followed by tracking with reliable initial frame identification and diversity-driven long-term memory. For the detection stage, we propose a cross-modal spatial-temporal Mamba to generate precise detection and segmentation results. Based on these results, our credible initial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinlab-imvr/resurgsam2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces