# Interpretable machine learning model for predicting 5-Year postoperative recurrence risk in patients with stage III colon cancer using preoperative laboratory tests: a two-centre study

**Authors:** Hangping Wei, Xihao Fu, Yuanyuan Cheng, Li Xu, Xinkai Wu, ZhenXin Wang

PMC · DOI: 10.1186/s12876-025-04511-9 · BMC Gastroenterology · 2026-01-29

## TL;DR

A machine learning model using preoperative lab tests and clinical data predicts 5-year recurrence risk in stage III colon cancer patients.

## Contribution

Development of an interpretable machine learning model for recurrence prediction in stage III colon cancer using preoperative data.

## Key findings

- Random forest model achieved highest AUC of 0.845 for recurrence prediction.
- Perineural invasion was identified as the most important predictive feature.
- SHAP analysis provided interpretable insights into feature importance.

## Abstract

Colorectal cancer (CRC) is one of the most prevalent malignant diseases worldwide and displays significant heterogeneity. The aim of this study was to investigate the application of machine learning algorithms to incorporate preoperative laboratory tests for predicting the 5-year recurrence risk in patients with stage III colon cancer (CC) postsurgery.

This study included two patient cohorts: the Zhejiang Cancer Hospital CC cohort (ZCC set, n = 290), which served as the training cohort, and the Dongyang CC cohort (DYC set, n = 125), which was utilized as an external testing cohort. Univariate analysis was initially performed on the 48 preoperative laboratory tests and 15 clinical and pathological features within the training cohort to pinpoint potential predictors. Features with a p value less than 0.05 were incorporated, and six machine learning models—logistic regression, random forest, XGBoost, support vector machine (SVM), back propagation neural network (BP NET), and K-nearest neighbour (KNN)—were employed to develop a model for predicting the 5-year recurrence risk in patients with stage III colon cancer. The prediction efficacy was assessed by calculating the area under the curve (AUC) of the machine learning model using the external test dataset, and comparisons were performed via the DeLong test. Ultimately, the Shapley additive explanations (SHAP) algorithm was applied to rank feature importance and compute the SHAP values for each feature, which were then visualized.

Univariate analysis identified 10 laboratory tests and 6 clinical and pathological features that were incorporated into six machine learning models. The random forest model exhibited the highest predictive performance in the test cohort, with an AUC of 0.845. Logistic regression closely trailed, achieving an AUC of 0.823. The DeLong test revealed that the predictive performance of the random forest model was comparable to that of logistic regression and outperformed the other models. SHAP analysis indicated that the most important feature for predicting the 5-year recurrence risk of stage III colon cancer was perineural invasion, followed by FIB and then PT.

A machine learning model constructed using preoperative laboratory tests and clinical and pathological features can assist in predicting the 5-year recurrence risk of patients with stage III colon cancer. This model provides potential reference values for the clinical development of individualized treatment strategies.

The online version contains supplementary material available at 10.1186/s12876-025-04511-9.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575), colon cancer (MONDO:0002032)

## Full-text entities

- **Genes:** CEACAM3 (CEA cell adhesion molecule 3) [NCBI Gene 1084] {aka CD66D, CEA, CGM1, CGM1a, W264, W282}, CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, ALPP (alkaline phosphatase, placental) [NCBI Gene 250] {aka ALP, PALP, PLAP, PLAP-1}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, FGB (fibrinogen beta chain) [NCBI Gene 2244] {aka HEL-S-78p}, TENM1 (teneurin transmembrane protein 1) [NCBI Gene 10178] {aka ODZ1, ODZ3, TEN-M1, TEN1, TNM, TNM1}
- **Diseases:** Cancer (MESH:D009369), PT (MESH:D006526), inflammation (MESH:D007249), stage III disease (MESH:D007676), CRC (MESH:D015179), SII (MESH:C566784), PT (MESH:D007020), PNI (MESH:D044342), bowel obstruction (MESH:D012778), rectal cancer (MESH:D012004), immune (MESH:D007154), NLR (MESH:D015467)
- **Chemicals:** 5-fluorouracil, leucovorin, and oxaliplatin (-), XELOX (MESH:C519688), oxaliplatin (MESH:D000077150), lipid (MESH:D008055)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12857113/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12857113/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC12857113/full.md

---
Source: https://tomesphere.com/paper/PMC12857113