# Development of a machine learning model to predict overall survival for large hepatocellular carcinoma at BCLC stage A or B after curative hepatectomy

**Authors:** Tai-Xin Yang, Jia-Yong Su, Min-Jun Li, Shuang Shen, Yu Wang, Huan-Nan Wei, Ming-Jian Huang, Qing-Man Qin, You-Yin Ran, Yao-Ting Huang, Jin-Yan Huang, Bang-De Xiang, Jie Zhang, Wen-Feng Gong

PMC · DOI: 10.3389/fimmu.2025.1640075 · Frontiers in Immunology · 2025-10-21

## TL;DR

This study developed a machine learning model to predict survival outcomes for patients with large liver cancer after surgery, aiming to improve personalized treatment planning.

## Contribution

The novel contribution is an interpretable gradient boosting machine model for predicting survival in large hepatocellular carcinoma patients post-surgery.

## Key findings

- The GBM model achieved AUC values of 0.742, 0.744, and 0.750 for 1-, 3-, and 5-year overall survival prediction.
- The model outperformed or matched existing predictive models and stratified patients into distinct prognostic groups.
- A web-based calculator was developed to generate risk scores for clinical use.

## Abstract

Patients with large hepatocellular carcinoma (LHCC) have a poor prognosis even after curative hepatectomy. This study aimed to develop and validate an interpretable machine learning (ML) model to predict their overall survival (OS).

This study included 2,565 patients with hepatocellular carcinoma (HCC) who underwent curative hepatectomy between January 2014 and December 2021. The LHCC patients were randomly assigned (7:3 ratio) to a training (n=1069) or validation (n=457) group. Independent risk factors for OS were identified using multivariable Cox regression. Eight ML models were developed and compared. The optimal model’s interpretability was assessed using Shapley Additive Explanations (SHAP).

LHCC patients experienced a considerable reduction in OS (Hazard Ratio, HR: 1.810, 95% Confidence Interval, CI: 1.585-2.068) compared to SHCC patients. Among eight ML models, the gradient boosting machine (GBM) model demonstrated superior performance. In the validation group, the GBM model achieved area under the receiver operating characteristic curve (AUC) values of 0.742, 0.744, and 0.750 for 1-, 3-, and 5-year OS, respectively. These results were comparable with or superior to established postoperative predictive models. The GBM model showed the ability to stratify patients with LHCC into distinct prognostic groups. A web-based calculator was developed for risk score generation. Notably, the GBM model showed enhanced predictive accuracy in patients with a high neutrophil-lymphocyte ratio (C-index: 0.819).

The GBM-based model demonstrated the potential to predict prognosis for patients with LHCC after curative hepatectomy. This interpretable model may assist in personalized risk assessment and tailoring postoperative management strategies.

## Linked entities

- **Diseases:** hepatocellular carcinoma (MONDO:0007256)

## Full-text entities

- **Diseases:** HCC (MESH:D006528)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12583128/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12583128/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/PMC12583128/full.md

---
Source: https://tomesphere.com/paper/PMC12583128