Bridging the Ex-Vivo to In-Vivo Gap: Synthetic Priors for Monocular Depth Estimation in Specular Surgical Environments
Ankan Aich, Emma D. Ryan, Kris Moe, Isaac Schmale, Li-Xing Man, and Yangming Lee

TL;DR
This paper introduces a novel method for monocular depth estimation in surgical environments, using synthetic priors and domain adaptation to bridge the gap between ex-vivo and in-vivo settings, achieving state-of-the-art results.
Contribution
It leverages synthetic depth priors with domain adaptation to improve in-vivo surgical depth estimation and introduces a new real-surgery validation dataset.
Findings
Achieves state-of-the-art on the SCARED dataset.
Reduces Squared Relative Error by over 17% in high-specularity regimes.
Demonstrates superior robustness on the ROCAL-T 90 dataset.
Abstract
Accurate Monocular Depth Estimation (MDE) is critical for autonomous robotic surgery. However, existing self-supervised methods often exhibit a severe "ex-vivo to in-vivo gap": they achieve high accuracy on public datasets but struggle in actual clinical deployments. This disparity arises because the severe specular reflections and fluid-filled deformations inherent to real surgeries. Models trained on noisy real-world pseudo-labels consequently suffer from severe boundary collapse. To address this, we leverage the high-fidelity synthetic priors of the \textit{Depth Anything V2} architecture, which inherently capture precise geometric details, and efficiently adapt them to the medical domain using Dynamic Vector Low-Rank Adaptation (DV-LORA). Our contributions are two-fold. Technically, our approach establishes a new state-of-the-art on the public SCARED dataset; under a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
