EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images
Xinning Yao, Bo Liu, Bojian Li, Jingjing Wang, Jinghua Yue, Fugen Zhou

TL;DR
EndoUFM is an innovative unsupervised framework that leverages dual foundation models and adaptive fine-tuning to significantly improve monocular depth estimation in endoscopic images, aiding surgical navigation.
Contribution
The paper introduces EndoUFM, a novel unsupervised depth estimation method that integrates dual foundation models with RVLoRA and Res-DSC for enhanced performance in endoscopic scenes.
Findings
Achieves state-of-the-art accuracy on multiple datasets
Maintains efficient model size for real-time applications
Enhances depth perception for surgical navigation
Abstract
Depth estimation is a foundational component for 3D reconstruction in minimally invasive endoscopic surgeries. However, existing monocular depth estimation techniques often exhibit limited performance to the varying illumination and complex textures of the surgical environment. While powerful visual foundation models offer a promising solution, their training on natural images leads to significant domain adaptability limitations and semantic perception deficiencies when applied to endoscopy. In this study, we introduce EndoUFM, an unsupervised monocular depth estimation framework that innovatively integrating dual foundation models for surgical scenes, which enhance the depth estimation performance by leveraging the powerful pre-learned priors. The framework features a novel adaptive fine-tuning strategy that incorporates Random Vector Low-Rank Adaptation (RVLoRA) to enhance model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
