TL;DR
This paper evaluates the reliability of monocular colonoscopy in polyp size classification, revealing reliance on examination cues over true metrics and highlighting key bottlenecks in scale and mask robustness.
Contribution
It provides a comprehensive diagnostic audit of polyp sizing models, introduces evaluation tools, and assesses the impact of scale and mask errors on model performance.
Findings
Models rely on cues correlated with examination behavior rather than true size.
Current depth estimation offers limited gains in size accuracy.
Segmentation errors under distribution shift significantly impair size classification.
Abstract
Accurate polyp size stratification guides surveillance decisions, with lesions larger than 5 mm typically requiring closer follow-up. However, monocular colonoscopy lacks a reliable metric reference. We present a diagnostic audit of binary polyp size classification (<=5 mm vs. >5 mm) across multiple public multi-center datasets, model families, and patient-stratified cross-validation. Across architectures and input modalities, including RGB appearance, relative depth, and photometry, model performance is moderately consistent, suggesting reliance on cues correlated with examination behavior rather than true metric scales. By providing ground-truth scale at varying granularities, we quantify the potential improvement from perfect scale information and show that current depth estimation and global calibration offer limited gains. We further demonstrate that segmentation errors under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
