Monocular absolute depth estimation from endoscopy via domain-invariant feature learning and latent consistency
Hao Li, Daiwei Lu, Jesse d'Almeida, Dilara Isik, Ehsan Khodapanah Aghdam, Nick DiSanto, Ayberk Acar, Susheela Sharma, Jie Ying Wu, Robert J. Webster III, Ipek Oguz

TL;DR
This paper introduces a domain-invariant feature learning approach for monocular absolute depth estimation in endoscopy, effectively reducing domain gaps and improving depth accuracy in surgical scenes.
Contribution
It proposes a latent feature alignment method that is agnostic to image translation, enhancing depth estimation across real and synthetic endoscopic images.
Findings
Outperforms state-of-the-art methods in absolute and relative depth metrics
Improves depth estimation across various backbone networks
Demonstrates effectiveness on endoscopic videos of central airway phantoms
Abstract
Monocular depth estimation (MDE) is a critical task to guide autonomous medical robots. However, obtaining absolute (metric) depth from an endoscopy camera in surgical scenes is difficult, which limits supervised learning of depth on real endoscopic images. Current image-level unsupervised domain adaptation methods translate synthetic images with known depth maps into the style of real endoscopic frames and train depth networks using these translated images with their corresponding depth maps. However a domain gap often remains between real and translated synthetic images. In this paper, we present a latent feature alignment method to improve absolute depth estimation by reducing this domain gap in the context of endoscopic videos of the central airway. Our methods are agnostic to the image translation process and focus on the depth estimation itself. Specifically, the depth network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Medical Image Segmentation Techniques · Video Coding and Compression Technologies
