# Interpretable machine learning models to predict survival in esophageal cancer: a study based on the SEER database and external validation in China

**Authors:** Abudouresuli Tuersun, Saimaitikari Abudoubari, Abudoushalamu Abudouwake, Huerxidan Tuerdi, Abulizi Maimaitiyiming, Pahatijiang Nijiati, Ya Qiu, Jianquan Wang

PMC · DOI: 10.3389/fphys.2025.1665383 · Frontiers in Physiology · 2025-10-29

## TL;DR

This study builds interpretable machine learning models to predict survival in esophageal cancer patients using data from the SEER database and validates them in a Chinese cohort.

## Contribution

The novel contribution is the development and validation of interpretable ML models for survival prediction in esophageal cancer with detailed interpretability assessments.

## Key findings

- The NMTLR model showed the best performance in predicting overall survival with AUC above 0.81 for 1-, 3-, and 5-year survival.
- Interpretability analyses identified M stage, N stage, age, and metastases as key predictors with consistent effects across datasets.
- All models demonstrated good discriminative power with integrated Brier scores below 0.175 in all validation sets.

## Abstract

We developed interpretable machine learning (ML) models to predict the overall survival (OS) of esophageal cancer patients. This approach aims to make our modeling results more interpretable and transparent.

We collected the clinicopathological information of esophageal cancer patients from the SEER database and divided them into training and validation sets at a ratio of 7:3. Meanwhile, we obtained an external validation cohort from the First People’s Hospital of Kashi in Xinjiang, China. Using LASSO and multivariate Cox regression analyses, we identified relevant risk factors and combined them to develop CoxPH and 6 ML models: Random Survival Forest (RSF), Gradient Boosting with Component Linear (GLMboost), decision tree (dt), boosting tree (bt), DeepSurv, and neural multi-task logistic regression (NMTLR). We evaluated the predictive performance of these ML models using the C-index, integral cumulative/dynamic AUC, integral Brier score, Kolmogorov-Smirnov (KS) test and Cramer-von Mises (CvM) test. For interpretability assessment, we employed three complementary methods: (1) time-dependent variable importance to quantify feature contribution across follow-up periods; (2) partial correlation survival plots to visualize individual variable effects; and (3) aggregated survival SHapley additive interpretation (SurvSHAP) plots with mean absolute deviation metrics to validate feature impact stability at both individual and population levels.

The final ML model consisted of 11 factors: grade, stage, T stage, N stage, M stage, radiotherapy, chemotherapy, bone metastasis, liver metastasis, lung metastasis, and age. Our predictive models demonstrate significant discriminative power; in particular, the NMTLR model performs best. For the training, validation, and external validation sets, the area under the curve (AUC) for one-, three-, and 5-year OS was higher than 0.81, and the integrated Brier score was consistently lower than 0.175. interpretability analyses confirmed consistent predictive logic: M stage, N stage, age, grade, bone metastases, liver metastases, lung metastases and radiotherapy were identified as the most influential predictors via quantifiable SurvSHAP values and time-dependent importance weights, with their effects visually validated through partial correlation survival curves.

The NMTLR prognostic model is the most effective at predicting the OS of esophageal cancer patients. It helps physicians correctly assess patient survival and provides valuable information for diagnosis and prognosis evaluation.

## Linked entities

- **Diseases:** esophageal cancer (MONDO:0007576)

## Full-text entities

- **Diseases:** esophageal cancer (MESH:D004938), bone metastases (MESH:D009362)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12605121/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12605121/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/PMC12605121/full.md

---
Source: https://tomesphere.com/paper/PMC12605121