EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training
Liangjing Shao, Linxin Bai, Chenkang Du, Xinrong Chen

TL;DR
EndoMUST introduces a multistep self-supervised training framework for monocular depth estimation in robotic endoscopy, effectively handling lighting variations and sparse textures to improve accuracy.
Contribution
The paper proposes a novel multistep finetuning strategy that enhances self-supervised depth estimation by isolating training modules, achieving state-of-the-art results in endoscopic scenes.
Findings
Achieves 4-10% lower error on SCARED and Hamlyn datasets.
State-of-the-art performance in zero-shot depth estimation.
Effective handling of lighting variations and sparse textures.
Abstract
Monocular depth estimation and ego-motion estimation are significant tasks for scene perception and navigation in stable, accurate and efficient robot-assisted endoscopy. To tackle lighting variations and sparse textures in endoscopic scenes, multiple techniques including optical flow, appearance flow and intrinsic image decomposition have been introduced into the existing methods. However, the effective training strategy for multiple modules are still critical to deal with both illumination issues and information interference for self-supervised depth estimation in endoscopy. Therefore, a novel framework with multistep efficient finetuning is proposed in this work. In each epoch of end-to-end training, the process is divided into three steps, including optical flow registration, multiscale image decomposition and multiple transformation alignments. At each step, only the related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Soft Robotics and Applications · Advanced Image Processing Techniques
