Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos
Zhimin Shao, Jialang Xu, Danail Stoyanov, Evangelos B. Mazomenos,, Yueming Jin

TL;DR
This paper introduces a real-time, end-to-end error detection framework for robotic surgical videos that leverages contextual reasoning modules inspired by natural language processing to improve accuracy over existing methods.
Contribution
It presents a novel Chain-of-Thought prompting framework with two reasoning modules for surgical error detection, outperforming state-of-the-art methods on a benchmark dataset.
Findings
Outperforms state-of-the-art by 4.6% in F1 score
Achieves 6.69 ms per frame processing time
Enhances safety and efficacy in robotic minimally invasive surgery
Abstract
Despite significant advancements in robotic systems and surgical data science, ensuring safe and optimal execution in robot-assisted minimally invasive surgery (RMIS) remains a complex challenge. Current surgical error detection methods involve two parts: identifying surgical gestures and then detecting errors within each gesture clip. These methods seldom consider the rich contextual and semantic information inherent in surgical videos, limiting their performance due to reliance on accurate gesture identification. Motivated by the chain-of-thought prompting in natural language processing, this letter presents a novel and real-time end-to-end error detection framework, Chain-of-Thought (COG) prompting, leveraging contextual information from surgical videos. This encompasses two reasoning modules designed to mimic the decision-making processes of expert surgeons. Concretely, we first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Robot Manipulation and Learning · Hand Gesture Recognition Systems
MethodsSoftmax · Attention Is All You Need
