Colon-X: Advancing Intelligent Colonoscopy toward Clinical Reasoning
Ge-Peng Ji, Jingyi Liu, Deng-Ping Fan, Huazhu Fu, Nick Barnes

TL;DR
This paper introduces Colon-X, a comprehensive multimodal dataset and models for colonoscopy, emphasizing the transition from understanding to clinical reasoning, and demonstrating improved reasoning accuracy in colonoscopy analysis.
Contribution
The paper presents ColonVQA, the largest multimodal colonoscopy dataset, and develops ColonR1, a reasoning-focused model that outperforms supervised methods under data scarcity.
Findings
Leading multimodal models lack robustness and trustworthiness.
ColonR1 achieves 56.61% accuracy, surpassing supervised fine-tuning by 25.22%.
Provides publicly available data and models for colonoscopy AI research.
Abstract
In this study, we present Colon-X, an open initiative aimed at advancing multimodal intelligence in colonoscopy. We begin by constructing ColonVQA, the most comprehensive multimodal dataset ever built for colonoscopy, featuring over 1.1M+ visual question answering entries across 76 clinical findings and 18 multimodal tasks. Beyond serving as a community-wide data foundation, we further investigate a critical yet underexplored transition in colonoscopy - evolving from multimodal understanding to clinical reasoning: (a) To capture the current landscape of multimodal understanding behaviors, we systematically assess the generalizability of 22 multimodal large language models and examine their reliability under human-induced perturbations. The results reveal that clinical outputs from leading MLLMs remain far from robust and trustworthy. (b) To narrow this gap, we further explore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Colorectal Cancer Screening and Detection · Topic Modeling
