Self-supervised Dense 3D Reconstruction from Monocular Endoscopic Video
Xingtong Liu, Ayushi Sinha, Masaru Ishii, Gregory D. Hager, Russell H., Taylor, Mathias Unberath

TL;DR
This paper introduces a self-supervised learning pipeline for dense 3D reconstruction from monocular endoscopic videos, achieving high accuracy without prior anatomical models or labeled data.
Contribution
It presents a novel self-supervised approach that combines unlabeled videos with multi-view stereo algorithms, eliminating the need for manual annotation or patient-specific models.
Findings
Produces photo-realistic dense 3D reconstructions
Achieves submillimeter mean residual errors
Works on unseen patients and scopes
Abstract
We present a self-supervised learning-based pipeline for dense 3D reconstruction from full-length monocular endoscopic videos without a priori modeling of anatomy or shading. Our method only relies on unlabeled monocular endoscopic videos and conventional multi-view stereo algorithms, and requires neither manual interaction nor patient CT in both training and application phases. In a cross-patient study using CT scans as groundtruth, we show that our method is able to produce photo-realistic dense 3D reconstructions with submillimeter mean residual errors from endoscopic videos from unseen patients and scopes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Robotics and Sensor-Based Localization
