UniSurgSAM: A Unified Promptable Model for Reliable Surgical Video Segmentation

Haofeng Liu; Ziyue Wang; Alex Y. W. Kong; Guanyi Qin; Yunqiu Xu; Chang Han Low; Mingqi Gao; Lap Yan Lennon Chan; Yueming Jin

arXiv:2604.03645·eess.IV·April 7, 2026

UniSurgSAM: A Unified Promptable Model for Reliable Surgical Video Segmentation

Haofeng Liu, Ziyue Wang, Alex Y. W. Kong, Guanyi Qin, Yunqiu Xu, Chang Han Low, Mingqi Gao, Lap Yan Lennon Chan, Yueming Jin

PDF

1 Repo

TL;DR

UniSurgSAM is a versatile, reliable surgical video segmentation model that supports multiple prompt types and addresses key challenges like hallucinations and mask drift, advancing computer-assisted surgery.

Contribution

It introduces a decoupled two-stage framework with novel designs for reliability, enabling real-time, multi-modal surgical video segmentation with state-of-the-art performance.

Findings

01

Achieves state-of-the-art accuracy across all prompt modalities.

02

Effectively suppresses hallucinations during target absence.

03

Prevents mask drift in long surgical sequences.

Abstract

Surgical video segmentation is fundamental to computer-assisted surgery. In practice, surgeons need to dynamically specify targets throughout extended procedures, using heterogeneous cues such as visual selections, textual expressions, or audio instructions. However, existing Promptable Video Object Segmentation (PVOS) methods are typically restricted to a single prompt modality and rely on coupled frameworks that cause optimization interference between target initialization and tracking. Moreover, these methods produce hallucinated predictions when the target is absent and suffer from accumulated mask drift without failure recovery. To address these challenges, we present UniSurgSAM, a unified PVOS model enabling reliable surgical video segmentation through visual, textual, or audio prompts. Specifically, UniSurgSAM employs a decoupled two-stage framework that independently optimizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinlab-imvr/UniSurgSAM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.