# A Novel Ensemble Framework for Comprehensive Early-Stage Colorectal Cancer Diagnosis, Prognosis, and Treatment: Integration of Gastroenterology-Specific Transformer Language Models and Multiple Decision Trees

**Authors:** Cem Simsek, Mete Ucdal, Suayib Yalcin, Derya Karakoc

PMC · DOI: 10.3390/jcm14134467 · Journal of Clinical Medicine · 2025-06-23

## TL;DR

A new system combining AI language models and decision trees helps detect and predict outcomes for early-stage colorectal cancer.

## Contribution

A novel ensemble framework integrating gastroenterology-specific transformer models and decision trees for CRC diagnosis and prognosis.

## Key findings

- GastroGPT accurately extracted patient data for CRC screening.
- CRC risk assessment model achieved an AUC-ROC of 0.85 for colonoscopy prediction.
- Survival prediction models showed C-indices between 0.71 and 0.75 for early-stage CRC patients.

## Abstract

Background: Colorectal cancer (CRC) remains a significant global health burden, with early detection and intervention crucial for improving patient outcomes. This study aims to develop and evaluate a novel proof-of-concept ensemble framework combining transformer-based language models and decision tree-based models for early-stage CRC screening, diagnosis, and prognosis. Methods: The ensemble framework consists of four key components: (1) GastroGPT, a transformer-based language model for extracting relevant data points from patient histories; (2) a decision tree-based model for assessing CRC risk and recommending colonoscopy; (3) GastroGPT for extracting data points from early CRC patients’ histories; and (4) a suite of decision tree-based models for predicting survival outcomes in early-stage CRC patients. The study employed a retrospective, observational, methodological design using simulated patient cases. Results: GastroGPT demonstrated high accuracy in extracting relevant data points from patient histories. The decision tree-based model for CRC risk assessment achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.85 (95% CI: 0.78–0.92) in predicting the need for colonoscopy. The decision tree-based models for survival prediction showed strong performance, with C-indices ranging from 0.71 to 0.75 for overall survival and disease-free survival at 24, 36, and 48 months. Conclusions: The novel ensemble framework demonstrates promising performance in early-stage CRC screening, diagnosis, and prognosis. Further research is needed to validate the models using larger, real-world datasets and to assess their clinical utility in prospective studies.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575), CRC (MONDO:0005575)

## Full-text entities

- **Diseases:** CRC (MESH:D015179)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12249666/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12249666/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12249666/full.md

---
Source: https://tomesphere.com/paper/PMC12249666