Predicting Early-Onset Colorectal Cancer with Large Language Models

Wilson Lau; Youngwon Kim; Sravanthi Parasa; Md Enamul Haque; Anand Oka; Jay Nanduri

arXiv:2506.11410·cs.CL·June 16, 2025

Predicting Early-Onset Colorectal Cancer with Large Language Models

Wilson Lau, Youngwon Kim, Sravanthi Parasa, Md Enamul Haque, Anand Oka, Jay Nanduri

PDF

Open Access

TL;DR

This study compares traditional machine learning models and advanced large language models in predicting early-onset colorectal cancer using recent patient data, demonstrating high accuracy with fine-tuned LLMs.

Contribution

It introduces the application of large language models for early CRC prediction and compares their performance with conventional models.

Findings

01

Fine-tuned LLM achieved 73% sensitivity.

02

LLM achieved 91% specificity.

03

Large language models outperform traditional models.

Abstract

The incidence rate of early-onset colorectal cancer (EoCRC, age < 45) has increased every year, but this population is younger than the recommended age established by national guidelines for cancer screening. In this paper, we applied 10 different machine learning models to predict EoCRC, and compared their performance with advanced large language models (LLM), using patient conditions, lab results, and observations within 6 months of patient journey prior to the CRC diagnoses. We retrospectively identified 1,953 CRC patients from multiple health systems across the United States. The results demonstrated that the fine-tuned LLM achieved an average of 73% sensitivity and 91% specificity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsColorectal Cancer Screening and Detection · Radiomics and Machine Learning in Medical Imaging