Poster Session I - A55 PERFORMANCE OF LARGE LANGUAGE MODELS IN THE OPTICAL DIAGNOSIS OF COLORECTAL POLYPS
J C Vences, W T Tran, N Gimpaya, C M Walsh, R Khan, R Bechara, D von Renteln, S Grover

TL;DR
This study evaluates how well large language models can help diagnose colorectal polyps using optical methods, comparing their performance to expert guidelines and traditional systems.
Contribution
The paper introduces the first evaluation of large language models in applying NICE and JNET classification systems for colorectal polyp diagnosis.
Findings
Claude Opus 4 and GPT-5 showed the highest accuracy in classifying polyps using the Paris system.
NICE classification had the highest percent correct scores among all tested models.
Sensitivity and specificity of MLLMs did not meet ESGE standards, indicating a need for further refinement before clinical use.
Abstract
Optical diagnosis allows for rapid endoscopic decision-making, but many practitioners are inadequately trained. Large language models, such as Anthropic’s Claude Opus 4, have not been evaluated in their ability to apply the NICE or JNET classification systems to colorectal polyps to more reliably predict lesion histologies. Additionally, there is a limited reference base for ideal prompting strategies in gastrointestinal endoscopy. The diagnostic accuracy of multimodal large language models (MLLMs) in classifying colorectal polyps and predicting histology will be comparable to societal (ASGE and ESGE) guidelines and to that of traditional computer-aided diagnostic systems. We conducted a retrospective diagnostic accuracy study using the PRIME dataset, a curated set of white light and narrow-band imaging (NBI) images. We evaluated Claude Opus 4, Google Gemini 2.5 Pro, GPT-o3, GPT-4o,…
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection · AI in cancer detection · Esophageal Cancer Research and Treatment
