TL;DR
SWoMo is a neuro-symbolic model for cataract surgery simulation that combines rule-based dynamics with a diffusion model for realistic visuals, improving generalization and downstream tasks.
Contribution
It introduces a decoupled neuro-symbolic approach with an inverse pairing strategy for sim-to-real translation in surgical simulation.
Findings
Shows qualitative and quantitative improvements over prior work.
Generalizes to unseen interaction geometries.
Enhances downstream phase detection and style transfer.
Abstract
Realistic surgical simulation plays a crucial role in training novice surgeons and in the development of autonomous agents. World models can scale such simulation environments to realistic and diverse procedures by predicting future patient states conditioned on current observations and surgical actions. However, current state-of-the-art approaches often fail to satisfy key criteria required for clinical applicability, including visual realism, physically grounded interactions, and the ability to simulate scenarios beyond the training distribution. Hence, we introduce SWoMo, a neuro-symbolic world model for cataract surgery simulation that decouples motion generation from visual realism. The symbolic component, consisting of a rule-based simulator and scene graph representations, models motion dynamics and tool-tissue interactions, while a diffusion model produces realistic visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
