SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning

Alejandra Perez; Anita Rau; Lee White; Busisiwe Mlambo; Chinedu Nwoye; Muhammad Abdullah Jamal; and Omid Mohareri

arXiv:2603.06570·cs.CV·March 9, 2026

SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning

Alejandra Perez, Anita Rau, Lee White, Busisiwe Mlambo, Chinedu Nwoye, Muhammad Abdullah Jamal, and Omid Mohareri

PDF

Open Access

TL;DR

SUREON introduces a large-scale surgical video QA dataset and models that enable AI to better understand surgical scenes, reasoning about intent, safety, and future steps, surpassing general models in accuracy.

Contribution

The paper presents SUREON, a novel dataset and models for surgical reasoning, leveraging expert narrations to train AI for complex surgical question answering.

Findings

01

SUREON dataset contains 206.8k QA pairs from 134.7K clips across 170 procedures.

02

Models outperform larger general-domain models, achieving over 84% accuracy on the benchmark.

03

Qualitative analysis shows models can infer operative intent and reasoning from visual data.

Abstract

Surgeons don't just see -- they interpret. When an expert observes a surgical scene, they understand not only what instrument is being used, but why it was chosen, what risk it poses, and what comes next. Current surgical AI cannot answer such questions, largely because training data that explicitly encodes surgical reasoning is immensely difficult to annotate at scale. Yet surgical video lectures already contain exactly this -- explanations of intent, rationale, and anticipation, narrated by experts for the purpose of teaching. Though inherently noisy and unstructured, these narrations encode the reasoning that surgical AI currently lacks. We introduce SUREON, a large-scale video QA dataset that systematically harvests this training signal from surgical academic videos. SUREON defines 12 question categories covering safety assessment, decision rationale, and forecasting, and uses a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Surgical Simulation and Training · Clinical Reasoning and Diagnostic Skills