Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos
Abdullah Hamdi, Changchun Yang, Xin Gao

TL;DR
Colon-Bench introduces a comprehensive, multi-stage annotation pipeline for full-procedure colonoscopy videos, creating an extensive benchmark dataset that enables evaluation of advanced multimodal language models in medical imaging.
Contribution
We present Colon-Bench, a novel scalable annotation workflow and benchmark dataset for colonoscopy videos, facilitating the development and evaluation of multimodal AI models in medical diagnostics.
Findings
MLLMs achieve high localization accuracy in colonoscopy videos
Zero-shot MLLMs improve performance with colon-skill prompting
The dataset includes over 300,000 bounding boxes and 213,000 segmentation masks
Abstract
Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic annotations required to evaluate modern Multimodal Large Language Models (MLLMs). To address this critical gap, we introduce Colon-Bench, generated via a novel multi-stage agentic workflow. Our pipeline seamlessly integrates temporal proposals, bounding-box tracking, AI-driven visual confirmation, and human-in-the-loop review to scalably annotate full-procedure videos. The resulting verified benchmark is unprecedented in scope, encompassing 528 videos, 14 distinct lesion categories (including polyps, ulcers, and bleeding), over 300,000 bounding boxes, 213,000 segmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
