SurgOnAir: Hierarchy-Aware Real-Time Surgical Video Commentary

Jingyi He; Yue Zhou; Long Bai; Kun Yuan; Nassir Navab; and Yuan Bi

arXiv:2605.21132·cs.CV·May 21, 2026

SurgOnAir: Hierarchy-Aware Real-Time Surgical Video Commentary

Jingyi He, Yue Zhou, Long Bai, Kun Yuan, Nassir Navab, and Yuan Bi

PDF

1 Repo

TL;DR

SurgOnAir is a real-time, hierarchy-aware vision-language model that generates surgical video commentary instantly, capturing workflow transitions and evolving details without offline processing.

Contribution

It introduces a streaming, hierarchy-aware surgical narration model trained on a new dataset, enabling immediate, fine-grained, and hierarchical understanding of surgical procedures.

Findings

01

Enables instant, fine-grained surgical narration.

02

Captures and signals key workflow transitions.

03

Outperforms existing offline methods in real-time understanding.

Abstract

Understanding surgical workflow in real time is fundamental for intelligent surgical embodiment, where AI systems continuously perceive and respond as surgery proceeds. In the operating room, critical decisions depend on subtle, moment-to-moment changes, such as fine instrument movements and evolving tissue states, where even slight perceptual delays can limit assistance or compromise safety. Yet existing methods remain offline or operate at coarse temporal scales, generating descriptions only after processing clips, preventing immediate reaction. We address this by proposing SurgOnAir, a streaming vision-language model that processes frames sequentially without future access and progressively generates narration tokens as visual input arrives. SurgOnAir achieves fine-grained frame-to-token generation, enabling instant responsiveness to evolving surgical dynamics. Built upon our curated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.