Generating Natural-Language Surgical Feedback: From Structured Representation to Domain-Grounded Evaluation
Firdavs Nasriddinov, Rafal Kocielnik, Anima Anandkumar, Andrew J. Hung

TL;DR
This paper introduces a structure-aware pipeline that learns a surgical action ontology from real feedback transcripts to generate clinically grounded, trainer-style surgical feedback using GPT-4, improving fidelity and verifiability.
Contribution
It presents a novel method for mining and normalizing surgical action triplets and conditioning feedback generation on these structured representations, enhancing automation and clinical relevance.
Findings
Improved video-to-IAT recognition accuracy with context and temporal tracking.
Enhanced feedback fidelity, doubling admissible feedback from 21% to 42%.
Significant reduction in word error rate and increase in ROUGE scores.
Abstract
High-quality intraoperative feedback from a surgical trainer is pivotal for improving trainee performance and long-term skill acquisition. Automating natural, trainer-style feedback promises timely, accessible, and consistent guidance at scale but requires models that understand clinically relevant representations. We present a structure-aware pipeline that learns a surgical action ontology from real trainer-to-trainee transcripts (33 surgeries) and uses it to condition feedback generation. We contribute by (1) mining Instrument-Action-Target (IAT) triplets from real-world feedback text and clustering surface forms into normalized categories, (2) fine-tuning a video-to-IAT model that leverages the surgical procedure and task contexts as well as fine-grained temporal instrument motion, and (3) demonstrating how to effectively use IAT triplet representations to guide GPT-4o in generating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
