# Machine learning-driven risk stratification for distant metastasis in gastric cancer: A comparative study of clinical features and composite indices integrated models

**Authors:** Shaoxue Yang, Han Lei

PMC · DOI: 10.1371/journal.pone.0335258 · 2025-10-30

## TL;DR

This study develops a machine learning model to predict distant metastasis in gastric cancer patients before surgery, using clinical features and lab indicators.

## Contribution

A novel interpretable machine learning model integrating clinical features and composite indices for preoperative metastasis prediction in gastric cancer.

## Key findings

- Logistic Regression achieved the highest AUC of 0.942 for predicting distant metastasis.
- Five key features were identified: cT stage, cN stage, differentiation grade, PLR, and TMI.
- The model showed strong performance in both internal and external test cohorts.

## Abstract

Distant metastasis (DM) of gastric cancer (GC) represents a significant health challenge due to its high mortality rates, necessitating advancements in early detection and management strategies. The objective of this study was to create a machine learning (ML) model that is interpretable for preoperative prediction of DM in GC.

We retrospectively analyzed 1,009 GC patients, of which 769 were from Zhejiang Cancer Hospital as development cohort and 240 from Zhejiang Provincial Hospital of Chinese Medicine as external test cohort. Nine clinical features, and four composite indices derived from ten laboratory indicators were selected as candidate features. The dataset was balanced using the borderline Synthetic Minority Over-sampling Technique (SMOTE) and the Edited Nearest Neighbors (ENN) under-sampling method. Univariate and multivariate analyses were used to identified key metastasis-related features. Based on the identified features, we developed predictive models incorporating five ML algorithms, with performance evaluated via receive operating characteristic (ROC) curves, recall, precision-recall (PR) curves. Ultimately, Shapley additive explanations (SHAP) analysis were applied to rank the feature importance and explain the final model.

Univariate and multivariate analyses identified five metastasis-related features: cT stage, cN stage, differentiation grade, PLR and TMI. Logistic Regression emerged as the optimal predictive model with the highest area under the curve (AUC) of 0.942 (95% CI: 0.922–0.962), Recall of 0.895 (95% CI: 0.843–0.947), and AUPRC of 0.889 (95% CI: 0.867–0.911) among five models. Additionally, the internal and external test cohorts recorded AUC values of 0.935 (95% CI: 0.897–0.972) and 0.879 (95% CI: 0.833–0.926), respectively. The SHAP analysis revealed the features that played a significant role in the predictions made by the model.

This ML model integrates clinical features and composite indices to predict GC metastasis risk, supported by an online tool to guide preoperative decision-making.

## Linked entities

- **Diseases:** gastric cancer (MONDO:0001056)

## Full-text entities

- **Diseases:** Cancer (MESH:D009369), DM (MESH:D009362), GC (MESH:D013274)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12574934/full.md

---
Source: https://tomesphere.com/paper/PMC12574934