Strategy-Supervised Autonomous Laparoscopic Camera Control via Event-Driven Graph Mining
Keyu Zhou, Peisen Xu, Yahao Wu, Jiming Chen, Gaofeng Li, Shunlei Li

TL;DR
This paper introduces a strategy-supervised framework for autonomous laparoscopic camera control that combines event-driven graph mining with vision-language inference, improving stability and safety in surgical views.
Contribution
It presents a novel approach that mines camera-handling strategies from surgical videos and applies them in real-time control with safety constraints and human-in-the-loop capabilities.
Findings
Event parsing achieves F1-score 0.86 for temporal localization.
Mined strategies align well with expert interpretation (cluster purity 0.81).
System reduces camera errors by over 35% and image shaking by over 62% in ex vivo tests.
Abstract
Autonomous laparoscopic camera control must maintain a stable and safe surgical view under rapid tool-tissue interactions while remaining interpretable to surgeons. We present a strategy-grounded framework that couples high-level vision-language inference with low-level closed-loop control. Offline, raw surgical videos are parsed into camera-relevant temporal events (e.g., interaction, working-distance deviation, and view-quality degradation) and structured as attributed event graphs. Mining these graphs yields a compact set of reusable camera-handling strategy primitives, which provide structured supervision for learning. Online, a fine-tuned Vision-Language Model (VLM) processes the live laparoscopic view to predict the dominant strategy and discrete image-based motion commands, executed by an IBVS-RCM controller under strict safety constraints; optional speech input enables intuitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Soft Robotics and Applications · Multimodal Machine Learning Applications
