# A machine learning-driven prognostic model based on peripheral blood lymphocyte subsets in osteosarcoma

**Authors:** Longqing Li, Jinlei Liu, Songtao Pang, Yuan Zhao, Yimeng Wang, Jia Wen, Yongkui Liu, Yi Zhang, Yan Zhang, Jiazhen Li, Nan Zhou, Xinchang Lu

PMC · DOI: 10.3389/fimmu.2026.1733518 · 2026-01-28

## TL;DR

This study uses machine learning to create a prognostic model for osteosarcoma patients based on blood lymphocyte subsets, offering better risk prediction than traditional markers.

## Contribution

The novel contribution is a machine learning model using lymphocyte subsets for improved risk stratification in osteosarcoma.

## Key findings

- The GBM model using CD3-CD56+ NK cells and CD8+HLA-DR+ T cells achieved an AUC of 0.959 for predicting survival.
- The gbm_risk_score was an independent prognostic factor with a hazard ratio of 14.516 and P = 0.012.
- A nomogram combining the GBM risk group and metastasis status had a C-index of 0.883 for 3-year survival prediction.

## Abstract

The prognosis of osteosarcoma (OS) remains heterogeneous, and the prognostic value of peripheral blood lymphocyte subsets, analyzed through machine learning (ML), is not fully explored. This study aimed to develop an ML-based prognostic model using lymphocyte subset data to improve risk stratification for OS patients.

We retrospectively analyzed data from 65 high-grade OS patients. Peripheral blood lymphocyte subsets were quantified by flow cytometry prior to treatment. Seven algorithms, including stepwise Cox, LASSO, and five ML models (RSF, GBM, XGBoost, SVM, KNN), were employed to construct prognostic models. Model performance was evaluated using the C-index, AUC, and validated via bootstrap and cross-validation.

The Gradient Boosting Machine (GBM) algorithm yielded the optimal two-variable model, incorporating CD3-CD56+ NK cells and CD8+HLA-DR+ activated cytotoxic T cells (AUC = 0.959). The resulting gbm_risk_score was an independent prognostic factor (HR = 14.516, P = 0.012) and effectively stratified patients into significantly divergent survival groups (P<0.001). Importantly, the gbm_risk_score demonstrated superior predictive performance for 3-year OS compared to traditional inflammatory indices, neutrophil-to-lymphocyte ratio (NLR) and platelet-to-lymphocyte ratio (PLR). A nomogram integrating the GBM risk group and primary metastasis status demonstrated excellent predictive accuracy (C-index: 0.883) and clinical utility, successfully identifying a high-risk subgroup among initially non-metastatic patients.

We developed and validated a robust ML-driven prognostic model based on peripheral blood lymphocyte subsets. This model, demonstrating superior prognostic value over conventional inflammatory markers, provides a novel and practical tool for personalized risk assessment in OS, potentially guiding more tailored treatment strategies.

## Linked entities

- **Diseases:** osteosarcoma (MONDO:0002623)

## Full-text entities

- **Genes:** NCAM1 (neural cell adhesion molecule 1) [NCBI Gene 4684] {aka CD56, MSK39, NCAM}, CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}
- **Diseases:** OS (MESH:D012516), inflammatory (MESH:D007249), metastasis (MESH:D009362)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12891181/full.md

---
Source: https://tomesphere.com/paper/PMC12891181