EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training

Liangjing Shao; Linxin Bai; Chenkang Du; Xinrong Chen

arXiv:2506.16017·cs.CV·June 23, 2025

EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training

Liangjing Shao, Linxin Bai, Chenkang Du, Xinrong Chen

PDF

Open Access 1 Repo

TL;DR

EndoMUST introduces a multistep self-supervised training framework for monocular depth estimation in robotic endoscopy, effectively handling lighting variations and sparse textures to improve accuracy.

Contribution

The paper proposes a novel multistep finetuning strategy that enhances self-supervised depth estimation by isolating training modules, achieving state-of-the-art results in endoscopic scenes.

Findings

01

Achieves 4-10% lower error on SCARED and Hamlyn datasets.

02

State-of-the-art performance in zero-shot depth estimation.

03

Effective handling of lighting variations and sparse textures.

Abstract

Monocular depth estimation and ego-motion estimation are significant tasks for scene perception and navigation in stable, accurate and efficient robot-assisted endoscopy. To tackle lighting variations and sparse textures in endoscopic scenes, multiple techniques including optical flow, appearance flow and intrinsic image decomposition have been introduced into the existing methods. However, the effective training strategy for multiple modules are still critical to deal with both illumination issues and information interference for self-supervised depth estimation in endoscopy. Therefore, a novel framework with multistep efficient finetuning is proposed in this work. In each epoch of end-to-end training, the process is divided into three steps, including optical flow registration, multiscale image decomposition and multiple transformation alignments. At each step, only the related…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baymaxshao/endomust
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Soft Robotics and Applications · Advanced Image Processing Techniques