Confidence-aware Monocular Depth Estimation for Minimally Invasive Surgery

Muhammad Asad; Emanuele Colleoni; Pritesh Mehta; Nicolas Toussaint; Ricardo Sanchez-Matilla; Maria Robu; Faisal Bashir; Rahim Mohammadi; Imanol Luengo; Danail Stoyanov

arXiv:2603.03571·cs.CV·March 5, 2026

Confidence-aware Monocular Depth Estimation for Minimally Invasive Surgery

Muhammad Asad, Emanuele Colleoni, Pritesh Mehta, Nicolas Toussaint, Ricardo Sanchez-Matilla, Maria Robu, Faisal Bashir, Rahim Mohammadi, Imanol Luengo, Danail Stoyanov

PDF

Open Access

TL;DR

This paper introduces a confidence-aware monocular depth estimation framework for minimally invasive surgery, improving accuracy and providing confidence maps to enhance clinical reliability amidst challenging endoscopic visuals.

Contribution

It presents a novel framework with calibrated confidence targets, confidence-aware loss, and inference-time confidence prediction to enhance depth estimation in MIS.

Findings

01

Improves depth estimation accuracy by ~8% on clinical datasets.

02

Enables robust confidence quantification of depth predictions.

03

Addresses noise and artifacts in endoscopic images.

Abstract

Purpose: Monocular depth estimation (MDE) is vital for scene understanding in minimally invasive surgery (MIS). However, endoscopic video sequences are often contaminated by smoke, specular reflections, blur, and occlusions, limiting the accuracy of MDE models. In addition, current MDE models do not output depth confidence, which could be a valuable tool for improving their clinical reliability. Methods: We propose a novel confidence-aware MDE framework featuring three significant contributions: (i) Calibrated confidence targets: an ensemble of fine-tuned stereo matching models is used to capture disparity variance into pixel-wise confidence probabilities; (ii) Confidence-aware loss: Baseline MDE models are optimized with confidence-aware loss functions, utilizing pixel-wise confidence probabilities such that reliable pixels dominate training; and (iii) Inference-time confidence: a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Video Coding and Compression Technologies