# Lymph node metastasis in patients with hepatocellular carcinoma using machine learning: a population-based study

**Authors:** Li Yuqin, Li Hongyan, Li Hongyuan, Li Tingting, He Kun, Fang Jie, Han Yunhui

PMC · DOI: 10.3389/fonc.2025.1601985 · Frontiers in Oncology · 2025-07-11

## TL;DR

This study develops a machine learning model to predict lymph node metastasis in hepatocellular carcinoma patients, helping identify those needing closer monitoring.

## Contribution

A population-adapted logistic regression model is proposed for predicting lymph node metastasis in HCC with strong generalizability.

## Key findings

- Race, tumor size, T stage, and AFP were identified as independent predictors of lymph node metastasis.
- The logistic regression model showed the best performance with an area under the curve of 0.751 in the SEER dataset.
- The model demonstrated robust generalizability with an area under the curve of 0.73 in external validation.

## Abstract

This study aims to develo\p a population-adapted machine learning-based prediction model for hepatocellular carcinoma (HCC) lymph node metastasis (LNM) to identify high-risk patients requiring intensive surveillance.

Data from 23511 HCC patients in the SEER database and 57 patients from our hospital were analyzed. Seven LNM risk indicators were selected. Four machine learning algorithms—decision tree (DT), logistic Regression (LR), multilayer perceptron (MLP), and extreme gradient boosting (XGBoost)—were employed to construct prediction models. Model performance was evaluated using area under the curve, accuracy, sensitivity, and specificity.

Among 23511 SEER patients, 1679 (7.14%) exhibited LNM. Race, Sequence number, Tumor size, T stage and AFP were identified as independent predictors of LNM. The LR model achieved optimal performance (area under the curve: 0.751; accuracy: 0.707; sensitivity: 0.711; specificity: 0.661). External validation with 57 patients from our hospital confirmed robust generalizability (area under the curve: 0.73; accuracy: 0.737; sensitivity: 0.829; specificity: 0.5), outperforming other models.

The LR-based model demonstrates superior predictive capability for LNM in HCC, offering clinicians a valuable tool to guide personalized therapeutic strategies.

## Linked entities

- **Diseases:** hepatocellular carcinoma (MONDO:0007256)

## Full-text entities

- **Genes:** AFP (alpha fetoprotein) [NCBI Gene 174] {aka AFPD, FETA, HPAFP}
- **Diseases:** Tumor (MESH:D009369), HCC (MESH:D006528), LNM (MESH:D008207)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12289565/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12289565/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12289565/full.md

---
Source: https://tomesphere.com/paper/PMC12289565