SimEndoGS: Efficient Data-driven Scene Simulation using Robotic Surgery   Videos via Physics-embedded 3D Gaussians

Zhenya Yang; Kai Chen; Yonghao Long; Qi Dou

arXiv:2405.00956·cs.RO·August 7, 2024·1 cites

SimEndoGS: Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians

Zhenya Yang, Kai Chen, Yonghao Long, Qi Dou

PDF

Open Access

TL;DR

This paper introduces a data-driven method for reconstructing and simulating surgical scenes from stereo videos using physics-embedded 3D Gaussians, enabling efficient, realistic, and near real-time surgical scene simulation.

Contribution

It proposes a novel learnable 3D Gaussian representation for surgical scenes, integrated with physics-based deformation, learned from stereo videos, and regularized for accuracy and realism.

Findings

01

Reconstructs surgical scenes in minutes from stereo videos

02

Produces visually and physically plausible deformations

03

Operates at speeds approaching real-time

Abstract

Surgical scene simulation plays a crucial role in surgical education and simulator-based robot learning. Traditional approaches for creating these environments with surgical scene involve a labor-intensive process where designers hand-craft tissues models with textures and geometries for soft body simulations. This manual approach is not only time-consuming but also limited in the scalability and realism. In contrast, data-driven simulation offers a compelling alternative. It has the potential to automatically reconstruct 3D surgical scenes from real-world surgical video data, followed by the application of soft body physics. This area, however, is relatively uncharted. In our research, we introduce 3D Gaussian as a learnable representation for surgical scene, which is learned from stereo endoscopic video. To prevent over-fitting and ensure the geometrical correctness of these scenes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging and Analysis · 3D Shape Modeling and Analysis · Medical Image Segmentation Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings