# Machine Learning for Lymph Node Metastasis Prediction in Early Gastric Cancer: A Comparative Analysis

**Authors:** Yufan Chen, Kunhao Bai, Minghui Yang, Chao Ma, Xiaohang Gao, Guoliang Xu, Yingbo Chen, Rong Zhang

PMC · DOI: 10.7150/ijms.124229 · 2026-02-11

## TL;DR

This study compares machine learning models to predict lymph node metastasis in early gastric cancer patients, aiming to improve treatment decisions.

## Contribution

A comparative analysis of seven machine learning models for predicting lymph node metastasis in early gastric cancer.

## Key findings

- Random Forest, Extreme Gradient Boosting, and Neural Network models showed strong performance with AUC values above 0.779.
- Logistic Models and Random Forest performed better in T1a and T1b subgroups compared to other models.
- SHAP analysis identified key variables influencing lymph node metastasis prediction in different subgroups.

## Abstract

Lymph node metastasis (LNM) plays a crucial role in informing treatment decisions and prognosis for early gastric cancer (EGC). This study aimed to offer a practical approach to predict LNM in EGC by using machine learning algorithms.

This study collected data from 1085 patients with EGC who underwent radical gastrectomy with D1+ or D2 lymph node resection. Seven machine-learning algorithms were compared, and hyperparameters were fine-tuned to identify the model with the best accuracy, Brier class and Area Under the Curve (AUC). The efficacy of the selected model was evaluated.

Following comparison, the Random Forest (RF), Extreme Gradient Boosting (Boost), and Neural Network (NNT) models exhibited exemplary performance on the training dataset, with AUC values of 0.796, 0.788, and 0.779, respectively, on the validation set. We conducted parallel analyses within the T1a and T1b subgroups, where Logistics Models (LM) and RF yielded AUCs of 0.710 and 0.636 in the T1a validation set, and LM, RF, and Boost achieved AUCs of 0.666, 0.658, and 0.558, respectively in the T1b validation set. Variable importance analysis utilizing SHAP revealed distinct values for lymph node metastasis (LNM) in EGC patients, as well as in those stratified into T1a and T1b groups.

The machine learning model holds the potential to guide more effective treatment strategies for early gastric cancer (EGC), specifically in addressing lymph node metastasis (LNM). The identified risk factors contribute valuable insights for personalized decision-making in the management of EGC patients.

## Linked entities

- **Diseases:** early gastric cancer (MONDO:0001060)

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, AFP (alpha fetoprotein) [NCBI Gene 174] {aka AFPD, FETA, HPAFP}, MUC16 (mucin 16, cell surface associated) [NCBI Gene 94025] {aka CA125}, CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, MUC1 (mucin 1, cell surface associated) [NCBI Gene 4582] {aka ADMCKD, ADMCKD1, ADTKD2, CA 15-3, CD227, Ca15-3}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}, CEACAM3 (CEA cell adhesion molecule 3) [NCBI Gene 1084] {aka CD66D, CEA, CGM1, CGM1a, W264, W282}
- **Diseases:** N (MESH:C536108), ulcerative (MESH:D014456), mucinous adenocarcinoma (MESH:D002288), 1b (MESH:C567213), T (MESH:D001260), carcinoma embryonic antigen (MESH:D018236), T stage 1a disease (MESH:D007676), metastasis (MESH:D009362), LVI (MESH:D009361), poorly differentiated carcinomas (MESH:D020522), signet-ring cell carcinoma (MESH:D018279), Gastric Fundic Gland Adenocarcinoma (MESH:C566775), SVC (MESH:D000079426), LNM (MESH:D008207), papillary adenocarcinoma (MESH:D000231), adenocarcinoma (MESH:D000230), -type tumors (MESH:D009369), NNET (MESH:D004195), EGA (MESH:D013274)
- **Chemicals:** PI (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12964569/full.md

---
Source: https://tomesphere.com/paper/PMC12964569