SimEndoGS: Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians
Zhenya Yang, Kai Chen, Yonghao Long, Qi Dou

TL;DR
This paper introduces a data-driven method for reconstructing and simulating surgical scenes from stereo videos using physics-embedded 3D Gaussians, enabling efficient, realistic, and near real-time surgical scene simulation.
Contribution
It proposes a novel learnable 3D Gaussian representation for surgical scenes, integrated with physics-based deformation, learned from stereo videos, and regularized for accuracy and realism.
Findings
Reconstructs surgical scenes in minutes from stereo videos
Produces visually and physically plausible deformations
Operates at speeds approaching real-time
Abstract
Surgical scene simulation plays a crucial role in surgical education and simulator-based robot learning. Traditional approaches for creating these environments with surgical scene involve a labor-intensive process where designers hand-craft tissues models with textures and geometries for soft body simulations. This manual approach is not only time-consuming but also limited in the scalability and realism. In contrast, data-driven simulation offers a compelling alternative. It has the potential to automatically reconstruct 3D surgical scenes from real-world surgical video data, followed by the application of soft body physics. This area, however, is relatively uncharted. In our research, we introduce 3D Gaussian as a learnable representation for surgical scene, which is learned from stereo endoscopic video. To prevent over-fitting and ensure the geometrical correctness of these scenes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging and Analysis · 3D Shape Modeling and Analysis · Medical Image Segmentation Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
