Multi-Modal Monocular Endoscopic Depth and Pose Estimation with Edge-Guided Self-Supervision

Xinwei Ju; Rema Daher; Danail Stoyanov; Sophia Bano; Francisco Vasconcelos

arXiv:2602.17785·cs.CV·February 23, 2026

Multi-Modal Monocular Endoscopic Depth and Pose Estimation with Edge-Guided Self-Supervision

Xinwei Ju, Rema Daher, Danail Stoyanov, Sophia Bano, Francisco Vasconcelos

PDF

Open Access

TL;DR

This paper introduces PRISM, a self-supervised framework for monocular depth and pose estimation in colonoscopy, leveraging edge detection and luminance decoupling to improve accuracy despite challenging conditions.

Contribution

The paper presents a novel self-supervised learning approach that incorporates anatomical and illumination priors, including edge maps and shading cues, for better depth and pose estimation in colonoscopy.

Findings

01

Self-supervised training on real data outperforms supervised phantom data.

02

Video frame rate significantly impacts model performance.

03

Domain realism is more crucial than ground truth availability.

Abstract

Monocular depth and pose estimation play an important role in the development of colonoscopy-assisted navigation, as they enable improved screening by reducing blind spots, minimizing the risk of missed or recurrent lesions, and lowering the likelihood of incomplete examinations. However, this task remains challenging due to the presence of texture-less surfaces, complex illumination patterns, deformation, and a lack of in-vivo datasets with reliable ground truth. In this paper, we propose **PRISM** (Pose-Refinement with Intrinsic Shading and edge Maps), a self-supervised learning framework that leverages anatomical and illumination priors to guide geometric learning. Our approach uniquely incorporates edge detection and luminance decoupling for structural guidance. Specifically, edge maps are derived using a learning-based edge detector (e.g., DexiNed or HED) trained to capture thin…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsColorectal Cancer Screening and Detection · Robotics and Sensor-Based Localization · Advanced Vision and Imaging