SurGo-R1: Benchmarking and Modeling Contextual Reasoning for Operative Zone in Surgical Video

Guanyi Qin; Xiaozhen Wang; Zhu Zhuo; Chang Han Low; Yuancan Xiao; Yibing Fu; Haofeng Liu; Kai Wang; Chunjiang Li; Yueming Jin

arXiv:2602.21706·cs.CV·February 26, 2026

SurGo-R1: Benchmarking and Modeling Contextual Reasoning for Operative Zone in Surgical Video

Guanyi Qin, Xiaozhen Wang, Zhu Zhuo, Chang Han Low, Yuancan Xiao, Yibing Fu, Haofeng Liu, Kai Wang, Chunjiang Li, Yueming Jin

PDF

Open Access

TL;DR

This paper introduces SurGo-R1, a model that improves identification of safe operative zones in surgical videos by incorporating phase-aware reasoning and a new benchmark, significantly outperforming existing vision-language models.

Contribution

The paper presents SurGo-R1, a phase-then-go architecture trained with RLHF, and a comprehensive benchmark with annotations for evaluating contextual reasoning in surgical videos.

Findings

01

SurGo-R1 achieves 76.6% phase accuracy.

02

It attains 32.7 mIoU in zone detection.

03

It improves hardcore accuracy by 6.6× over generalist models.

Abstract

Minimally invasive surgery has dramatically improved patient operative outcomes, yet identifying safe operative zones remains challenging in critical phases, requiring surgeons to integrate visual cues, procedural phase, and anatomical context under high cognitive load. Existing AI systems offer binary safety verification or static detection, ignoring the phase-dependent nature of intraoperative reasoning. We introduce ResGo, a benchmark of laparoscopic frames annotated with Go Zone bounding boxes and clinician-authored rationales covering phase, exposure quality reasoning, next action and risk reminder. We introduce evaluation metrics that treat correct grounding under incorrect phase as failures, revealing that most vision-language models cannot handle such tasks and perform poorly. We then present SurGo-R1, a model optimized via RLHF with a multi-turn phase-then-go architecture where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Multimodal Machine Learning Applications · Soft Robotics and Applications