Generalizing monocular colonoscopy image depth estimation by uncertainty-based global and local fusion network
Sijia Du, Chengfeng Zhou, Suncheng Xiang, Jianwei Xu, Dahong Qian

TL;DR
This paper introduces a novel CNN-Transformer framework with an uncertainty-based fusion block that significantly improves depth estimation and generalization in colonoscopy images, enabling robust clinical applications without fine-tuning.
Contribution
The study presents a new deep learning architecture combining local and global features with uncertainty-based fusion, enhancing depth estimation generalization in endoscopic images.
Findings
Excellent generalization across multiple datasets
Robust depth estimation in real clinical scenarios
No fine-tuning required for unseen data
Abstract
Objective: Depth estimation is crucial for endoscopic navigation and manipulation, but obtaining ground-truth depth maps in real clinical scenarios, such as the colon, is challenging. This study aims to develop a robust framework that generalizes well to real colonoscopy images, overcoming challenges like non-Lambertian surface reflection and diverse data distributions. Methods: We propose a framework combining a convolutional neural network (CNN) for capturing local features and a Transformer for capturing global information. An uncertainty-based fusion block was designed to enhance generalization by identifying complementary contributions from the CNN and Transformer branches. The network can be trained with simulated datasets and generalize directly to unseen clinical data without any fine-tuning. Results: Our method is validated on multiple datasets and demonstrates an excellent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection · Image Processing Techniques and Applications · Image Retrieval and Classification Techniques
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections
