ColonAdapter: Geometry Estimation Through Foundation Model Adaptation for Colonoscopy
Zhiyi Jiang, Yifu Wang, Xuelian Cheng, Zongyuan Ge

TL;DR
This paper introduces ColonAdapter, a self-supervised framework that adapts foundation models for accurate 3D geometry estimation in colonoscopy images, overcoming challenges posed by clinical scene textures and lighting.
Contribution
We propose a novel fine-tuning method with modules and loss functions tailored for colonoscopy data, improving geometric estimation accuracy without ground-truth parameters.
Findings
Achieves state-of-the-art results in camera pose estimation
Improves monocular depth prediction accuracy
Enhances dense 3D point map reconstruction
Abstract
Estimating 3D geometry from monocular colonoscopy images is challenging due to non-Lambertian surfaces, moving light sources, and large textureless regions. While recent 3D geometric foundation models eliminate the need for multi-stage pipelines, their performance deteriorates in clinical scenes. These models are primarily trained on natural scene datasets and struggle with specularity and homogeneous textures typical in colonoscopy, leading to inaccurate geometry estimation. In this paper, we present ColonAdapter, a self-supervised fine-tuning framework that adapts geometric foundation models for colonoscopy geometry estimation. Our method leverages pretrained geometric priors while tailoring them to clinical data. To improve performance in low-texture regions and ensure scale consistency, we introduce a Detail Restoration Module (DRM) and a geometry consistency loss. Furthermore, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Colorectal Cancer Screening and Detection
