Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video
Shuxian Wang, Akshay Paruchuri, Zhaoxi Zhang, Sarah McGill, Roni, Sengupta

TL;DR
This paper introduces a structure-preserving image translation pipeline that enhances depth estimation in colonoscopy videos by bridging the domain gap between synthetic and real images, leading to improved clinical application.
Contribution
It proposes a novel sim2real image translation method that preserves depth structure, enabling better supervised depth estimation in colonoscopy videos.
Findings
Improved depth estimation accuracy on clinical datasets.
Realistic and structure-preserving translated images.
Enhanced generalization of depth models to real clinical data.
Abstract
Monocular depth estimation in colonoscopy video aims to overcome the unusual lighting properties of the colonoscopic environment. One of the major challenges in this area is the domain gap between annotated but unrealistic synthetic data and unannotated but realistic clinical data. Previous attempts to bridge this domain gap directly target the depth estimation task itself. We propose a general pipeline of structure-preserving synthetic-to-real (sim2real) image translation (producing a modified version of the input image) to retain depth geometry through the translation process. This allows us to generate large quantities of realistic-looking synthetic images for supervised depth estimation with improved generalization to the clinical domain. We also propose a dataset of hand-picked sequences from clinical colonoscopies to improve the image translation process. We demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection
