Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction
Ruofeng Wei, Bin Li, Hangjie Mo, Fangxun Zhong, Yonghao Long, Qi Dou,, Yun-Hui Liu, Dong Sun

TL;DR
This paper introduces a novel deep learning framework that combines robot kinematics and monocular endoscopy to accurately estimate metric depth and reconstruct 3D surgical scenes, overcoming limitations of traditional stereo methods.
Contribution
The authors develop a unified approach integrating robot kinematics, monocular images, and deep learning for precise metric depth estimation and 3D reconstruction in robotic surgery.
Findings
Achieved comparable depth estimation performance to stereo methods on public datasets.
Developed a Depth-driven Sliding Optimization (DDSO) algorithm for scale extraction.
Successfully reconstructed 3D surgical scenes from monocular endoscopic videos.
Abstract
Estimating precise metric depth and scene reconstruction from monocular endoscopy is a fundamental task for surgical navigation in robotic surgery. However, traditional stereo matching adopts binocular images to perceive the depth information, which is difficult to transfer to the soft robotics-based surgical systems due to the use of monocular endoscopy. In this paper, we present a novel framework that combines robot kinematics and monocular endoscope images with deep unsupervised learning into a single network for metric depth estimation and then achieve 3D reconstruction of complex anatomy. Specifically, we first obtain the relative depth maps of surgical scenes by leveraging a brightness-aware monocular depth estimation method. Then, the corresponding endoscope poses are computed based on non-linear optimization of geometric and photometric reprojection residuals. Afterwards, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Soft Robotics and Applications
