Self-supervised Monocular Depth and Pose Estimation for Endoscopy with   Generative Latent Priors

Ziang Xu; Bin Li; Yang Hu; Chenyu Zhang; James East; Sharib Ali; Jens; Rittscher

arXiv:2411.17790·cs.CV·December 10, 2024

Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors

Ziang Xu, Bin Li, Yang Hu, Chenyu Zhang, James East, Sharib Ali, Jens, Rittscher

PDF

Open Access

TL;DR

This paper introduces a self-supervised framework for monocular depth and pose estimation in endoscopy, utilizing generative latent priors and a VAE to improve accuracy and robustness in challenging GI tract conditions.

Contribution

It presents a novel approach combining a Generative Latent Bank and a VAE for enhanced depth and pose estimation in endoscopy, addressing generalizability issues of prior methods.

Findings

01

Outperforms existing self-supervised methods on endoscopic datasets

02

Improves depth prediction realism and robustness

03

Enhances pose estimation stability and accuracy

Abstract

Accurate 3D mapping in endoscopy enables quantitative, holistic lesion characterization within the gastrointestinal (GI) tract, requiring reliable depth and pose estimation. However, endoscopy systems are monocular, and existing methods relying on synthetic datasets or complex models often lack generalizability in challenging endoscopic conditions. We propose a robust self-supervised monocular depth and pose estimation framework that incorporates a Generative Latent Bank and a Variational Autoencoder (VAE). The Generative Latent Bank leverages extensive depth scenes from natural images to condition the depth network, enhancing realism and robustness of depth predictions through latent feature priors. For pose estimation, we reformulate it within a VAE framework, treating pose transitions as latent variables to regularize scale, stabilize z-axis prominence, and improve x-y sensitivity.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsColorectal Cancer Screening and Detection