EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images

Xinning Yao; Bo Liu; Bojian Li; Jingjing Wang; Jinghua Yue; Fugen Zhou

arXiv:2508.17916·cs.CV·August 26, 2025

EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images

Xinning Yao, Bo Liu, Bojian Li, Jingjing Wang, Jinghua Yue, Fugen Zhou

PDF

TL;DR

EndoUFM is an innovative unsupervised framework that leverages dual foundation models and adaptive fine-tuning to significantly improve monocular depth estimation in endoscopic images, aiding surgical navigation.

Contribution

The paper introduces EndoUFM, a novel unsupervised depth estimation method that integrates dual foundation models with RVLoRA and Res-DSC for enhanced performance in endoscopic scenes.

Findings

01

Achieves state-of-the-art accuracy on multiple datasets

02

Maintains efficient model size for real-time applications

03

Enhances depth perception for surgical navigation

Abstract

Depth estimation is a foundational component for 3D reconstruction in minimally invasive endoscopic surgeries. However, existing monocular depth estimation techniques often exhibit limited performance to the varying illumination and complex textures of the surgical environment. While powerful visual foundation models offer a promising solution, their training on natural images leads to significant domain adaptability limitations and semantic perception deficiencies when applied to endoscopy. In this study, we introduce EndoUFM, an unsupervised monocular depth estimation framework that innovatively integrating dual foundation models for surgical scenes, which enhance the depth estimation performance by leveraging the powerful pre-learned priors. The framework features a novel adaptive fine-tuning strategy that incorporates Random Vector Low-Rank Adaptation (RVLoRA) to enhance model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.