SAW: Toward a Surgical Action World Model via Controllable and Scalable Video Generation
Sampath Rapuri, Lalithkumar Seenivasan, Dominik Schneider, Roger Soberanis-Mukul, Yufan He, Hao Ding, Jiru Xu, Chenhao Yu, Chenyan Jing, Pengfei Guo, Daguang Xu, Mathias Unberath

TL;DR
SAW introduces a controllable, scalable video diffusion model for surgical action video generation, enhancing realism, temporal consistency, and utility in surgical AI and simulation with minimal annotations.
Contribution
It proposes a novel diffusion-based approach conditioned on lightweight signals, enabling realistic, consistent surgical video generation without complex annotations.
Findings
SAW achieves state-of-the-art temporal consistency in surgical videos.
Augmenting rare surgical actions with SAW improves recognition accuracy.
SAW enables realistic surgical simulation from trajectory data.
Abstract
A surgical world model capable of generating realistic surgical action videos with precise control over tool-tissue interactions can address fundamental challenges in surgical AI and simulation -- from data scarcity and rare event synthesis to bridging the sim-to-real gap for surgical automation. However, current video generation methods, the very core of such surgical world models, require expensive annotations or complex structured intermediates as conditioning signals at inference, limiting their scalability. Other approaches exhibit limited temporal consistency across complex laparoscopic scenes and do not possess sufficient realism. We propose Surgical Action World (SAW) -- a step toward surgical action world modeling through video diffusion conditioned on four lightweight signals: language prompts encoding tool-action context, a reference surgical scene, tissue affordance mask,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Soft Robotics and Applications · Robot Manipulation and Learning
