Realistic Surgical Simulation from Monocular Videos
Kailing Wang, Chen Yang, Keyang Zhao, Xiaokang Yang, Wei Shen

TL;DR
SurgiSim is an innovative system that creates realistic surgical simulations from monocular videos by maintaining a canonical 3D scene and modeling tissue deformations, improving the fidelity of soft tissue simulations.
Contribution
The paper introduces SurgiSim, a novel automatic surgical simulation system that enhances geometric consistency and physical realism in soft tissue simulations from monocular videos.
Findings
Successfully simulates soft tissue deformations in surgical scenarios
Demonstrates improved geometric consistency in 3D scene reconstruction
Shows potential for surgical training and robotic surgery applications
Abstract
This paper tackles the challenge of automatically performing realistic surgical simulations from readily available surgical videos. Recent efforts have successfully integrated physically grounded dynamics within 3D Gaussians to perform high-fidelity simulations in well-reconstructed simulation environments from static scenes. However, they struggle with the geometric inconsistency in reconstructing simulation environments and unrealistic physical deformations in simulations of soft tissues when it comes to dynamic and complex surgical processes. In this paper, we propose SurgiSim, a novel automatic simulation system to overcome these limitations. To build a surgical simulation environment, we maintain a canonical 3D scene composed of 3D Gaussians coupled with a deformation field to represent a dynamic surgical scene. This process involves a multi-stage optimization with trajectory and…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The work proposes a novel solution to alleviate difficulties in surgical simulation pipelines. The current approach use mesh-based tissue modeling, which although are realistic, require manual modeling. The authors combine several existing technologies in a novel way to construct a realistic environment that maintains physical realism. The paper is well written and technically sound. The user study is another strength of this approach as it is both quantitatively validated and qualitatively wit
The main weaknesses I see in this work are related to generalization, proper application, and certain modeling decisions. - In the design of regularization for geometry, how do the authors guarantee that no tissue inversions would occur? As described on Line 244 (Multi-Stage Optimization with Trajectory and Geometric Regularization), the authors add soft regularization to prevent large deformation of the Gaussian points. However, this is a soft loss, and I am not sure how well motivated it is.
- Interesting approach including a novel multi-stage optimisation of trajectory and geometric regularisation - Estimating physical parameters of tissues directly from videos is a welcome innovation - Method uses off-the-shelf methods monocular methods, meaning that stereo information and manual segmentations are not required - Authors perform a user study with 44 surgeons and 24 laypersons, where a majority find their method to produce the most realistic simulations - SurgiSim achieves a g
- Limited evaluation, in particular no quantitative results (see questions) - Details regarding methods and evaluation are missing (see questions) - Paper is on the whole well-written, but could be clearer and more descriptive in places
- The paper has a clear motivation. Maintaining a physics-based model for tissue deformation is critical in surgical environments. - The use of physical parameters helps to maintain the optimization target with clear global information, rather than merely relying on Gaussian parameters. This approach leads to practical significance and explainability.
- The technical aspects are somewhat unclear. My understanding is that [1] attempts to learn physical properties from prior knowledge, which is leveraged from video diffusion models. Although the video diffusion model (VDM) is an imperfect prior model, it can provide additional information about the material. However, endoscopic reconstruction is primarily a task of reconstruction, where most information is already present in the video. This work does not introduce additional priors apart from t
1. The user study part is good and ensures the usability of the proposed system.
1. The methodology presented in this paper similar to the paper"SimEndoGS: Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians" with no substantial novel improvements. The evaluation in Figure 3 does not suggest advancement in this work when comparing with the SimEndoGS. 2. Lacks comprehensive quantitative evaluation of the proposed method itself, e.g. did not compare with SimEndoGS. 3. Lacks significant testing.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Augmented Reality Applications
