Strategy-Supervised Autonomous Laparoscopic Camera Control via Event-Driven Graph Mining

Keyu Zhou; Peisen Xu; Yahao Wu; Jiming Chen; Gaofeng Li; Shunlei Li

arXiv:2602.20500·cs.RO·February 25, 2026

Strategy-Supervised Autonomous Laparoscopic Camera Control via Event-Driven Graph Mining

Keyu Zhou, Peisen Xu, Yahao Wu, Jiming Chen, Gaofeng Li, Shunlei Li

PDF

Open Access

TL;DR

This paper introduces a strategy-supervised framework for autonomous laparoscopic camera control that combines event-driven graph mining with vision-language inference, improving stability and safety in surgical views.

Contribution

It presents a novel approach that mines camera-handling strategies from surgical videos and applies them in real-time control with safety constraints and human-in-the-loop capabilities.

Findings

01

Event parsing achieves F1-score 0.86 for temporal localization.

02

Mined strategies align well with expert interpretation (cluster purity 0.81).

03

System reduces camera errors by over 35% and image shaking by over 62% in ex vivo tests.

Abstract

Autonomous laparoscopic camera control must maintain a stable and safe surgical view under rapid tool-tissue interactions while remaining interpretable to surgeons. We present a strategy-grounded framework that couples high-level vision-language inference with low-level closed-loop control. Offline, raw surgical videos are parsed into camera-relevant temporal events (e.g., interaction, working-distance deviation, and view-quality degradation) and structured as attributed event graphs. Mining these graphs yields a compact set of reusable camera-handling strategy primitives, which provide structured supervision for learning. Online, a fine-tuned Vision-Language Model (VLM) processes the live laparoscopic view to predict the dominant strategy and discrete image-based motion commands, executed by an IBVS-RCM controller under strict safety constraints; optional speech input enables intuitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Soft Robotics and Applications · Multimodal Machine Learning Applications