Frontiers in Intelligent Colonoscopy
Ge-Peng Ji, Jingyi Liu, Peng Xu, Nick Barnes, Fahad Shahbaz Khan, Salman Khan, Deng-Ping Fan

TL;DR
This paper reviews current advances in intelligent colonoscopy, introduces new multimodal datasets and models, and discusses future research directions to improve colorectal cancer screening.
Contribution
It presents a comprehensive assessment of colonoscopy perception tasks, introduces ColonINST and ColonGPT, and establishes a multimodal benchmark for the field.
Findings
Multimodal research in colonoscopy is still open for exploration.
The paper introduces a large-scale multimodal dataset, ColonINST.
A new colonoscopy-specific multimodal language model, ColonGPT, is proposed.
Abstract
Colonoscopy is currently one of the most sensitive screening methods for colorectal cancer. This study investigates the frontiers of intelligent colonoscopy techniques and their prospective implications for multimodal medical applications. With this goal, we begin by assessing the current data-centric and model-centric landscapes through four tasks for colonoscopic scene perception, including classification, detection, segmentation, and vision-language understanding. This assessment enables us to identify domain-specific challenges and reveals that multimodal research in colonoscopy remains open for further exploration. To embrace the coming multimodal era, we establish three foundational initiatives: a large-scale multimodal instruction tuning dataset ColonINST, a colonoscopy-designed multimodal language model ColonGPT, and a multimodal benchmark. To facilitate ongoing monitoring of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection
