Endo-Depth-and-Motion: Reconstruction and Tracking in Endoscopic Videos using Depth Networks and Photometric Constraints
David Recasens, Jos\'e Lamarca, Jos\'e M. F\'acil, J. M. M. Montiel,, Javier Civera

TL;DR
This paper introduces Endo-Depth-and-Motion, a self-supervised pipeline for reconstructing 3D scenes and tracking camera motion in endoscopic videos, overcoming challenges like deformation and low texture.
Contribution
It combines depth networks with photometric constraints to estimate camera pose and dense 3D models from monocular endoscopic videos, with extensive evaluation on public datasets.
Findings
High-quality 3D reconstructions achieved
Effective camera pose tracking demonstrated
Outperforms relevant baseline methods
Abstract
Estimating a scene reconstruction and the camera motion from in-body videos is challenging due to several factors, e.g. the deformation of in-body cavities or the lack of texture. In this paper we present Endo-Depth-and-Motion, a pipeline that estimates the 6-degrees-of-freedom camera pose and dense 3D scene models from monocular endoscopic videos. Our approach leverages recent advances in self-supervised depth networks to generate pseudo-RGBD frames, then tracks the camera pose using photometric residuals and fuses the registered depth maps in a volumetric representation. We present an extensive experimental evaluation in the public dataset Hamlyn, showing high-quality results and comparisons against relevant baselines. We also release all models and code for future comparisons.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Robotics and Sensor-Based Localization
