SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning
Alejandra Perez, Anita Rau, Lee White, Busisiwe Mlambo, Chinedu Nwoye, Muhammad Abdullah Jamal, and Omid Mohareri

TL;DR
SUREON introduces a large-scale surgical video QA dataset and models that enable AI to better understand surgical scenes, reasoning about intent, safety, and future steps, surpassing general models in accuracy.
Contribution
The paper presents SUREON, a novel dataset and models for surgical reasoning, leveraging expert narrations to train AI for complex surgical question answering.
Findings
SUREON dataset contains 206.8k QA pairs from 134.7K clips across 170 procedures.
Models outperform larger general-domain models, achieving over 84% accuracy on the benchmark.
Qualitative analysis shows models can infer operative intent and reasoning from visual data.
Abstract
Surgeons don't just see -- they interpret. When an expert observes a surgical scene, they understand not only what instrument is being used, but why it was chosen, what risk it poses, and what comes next. Current surgical AI cannot answer such questions, largely because training data that explicitly encodes surgical reasoning is immensely difficult to annotate at scale. Yet surgical video lectures already contain exactly this -- explanations of intent, rationale, and anticipation, narrated by experts for the purpose of teaching. Though inherently noisy and unstructured, these narrations encode the reasoning that surgical AI currently lacks. We introduce SUREON, a large-scale video QA dataset that systematically harvests this training signal from surgical academic videos. SUREON defines 12 question categories covering safety assessment, decision rationale, and forecasting, and uses a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Surgical Simulation and Training · Clinical Reasoning and Diagnostic Skills
