# Interpretable machine-learning prediction of severe myelosuppression in colorectal cancer patients receiving chemotherapy using XGBoost and SHAP: a retrospective study with a web-based calculator

**Authors:** Linxian Ding, Lixia Peng, Zheng Xu, Zhangli Cui, Zhongming Wang

PMC · DOI: 10.3389/fonc.2026.1785146 · 2026-03-19

## TL;DR

This study creates an interpretable machine-learning model to predict severe myelosuppression in colorectal cancer patients undergoing chemotherapy, with a web-based tool for real-time risk assessment.

## Contribution

The novel contribution is an interpretable XGBoost model with SHAP analysis and a web-based calculator for predicting chemotherapy-induced myelosuppression in CRC patients.

## Key findings

- The XGBoost model achieved high predictive performance (AUC = 0.906) for severe myelosuppression.
- SHAP analysis identified key predictors like white blood cell count and chemotherapy cycles with nonlinear effects.
- A web-based calculator was developed for real-time individualized risk estimation with favorable clinical benefit.

## Abstract

Patients with colorectal cancer (CRC) are susceptible to severe myelosuppression (SMS) after chemotherapy. Conventional linear models may have limited performance and may fail to capture complex, nonlinear risk patterns, which can hinder early risk stratification and timely intervention. We aimed to develop an interpretable machine-learning model to predict SMS and to build a web-based calculator for individualized risk assessment.

We retrospectively enrolled 987 CRC patients who received capecitabine plus oxaliplatin with or without targeted therapy at our hospital between March 2021 and November 2025. Nine predictors were selected using least absolute shrinkage and selection operator (LASSO) regression. We developed and compared several models, including extreme gradient boosting (XGBoost), random forest, decision tree, and support vector machine. Model interpretability was assessed using SHapley Additive exPlanations (SHAP) at both the global and individual levels to characterize nonlinear effects and feature interactions. A web-based, real-time risk calculator was also implemented.

On the validation set, the XGBoost model achieved the best balance of predictive performance (AUC = 0.906; sensitivity = 0.864). SHAP analysis quantified the contribution of each feature, with the top three contributors being white blood cell count, number of chemotherapy cycles, and Karnofsky Performance Status score. Nonlinear threshold effects were observed for continuous variables, including white blood cell count, platelet count, and serum albumin. Interactions were identified between white blood cell count and performance status, as well as between white blood cell count and number of chemotherapy cycles. The web-based calculator enables real-time individualized risk estimation. Decision curve analysis indicated favorable net clinical benefit across a range of decision thresholds.

We developed a high-performing and interpretable model for predicting SMS in CRC patients receiving chemotherapy. The accompanying web-based calculator may provide a practical tool for early risk stratification and individualized management of chemotherapy-related SMS.

## Linked entities

- **Chemicals:** capecitabine (PubChem CID 60953), oxaliplatin (PubChem CID 9887053)
- **Diseases:** colorectal cancer (MONDO:0005575)

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}
- **Diseases:** SMS (MESH:D045169), CRC (MESH:D015179)
- **Chemicals:** oxaliplatin (MESH:D000077150), capecitabine (MESH:D000069287)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13043381/full.md

---
Source: https://tomesphere.com/paper/PMC13043381