# From complex algorithms to clinical practice: a multicenter machine learning model and simplified decision tree for predicting cachexia risk in gastric cancer

**Authors:** Jian Zhao, Yu Deng, Yajie Guo, Yaoyao Wu, Xiaozhou Yang, Tengyu Zeng, Yihuan Qiao, Huadong Zhao, Jiawei Song, Beilei Hou, Qianyong Yang

PMC · DOI: 10.3389/fonc.2026.1767547 · Frontiers in Oncology · 2026-03-10

## TL;DR

This study develops a machine learning model and a simplified decision tree to predict cachexia risk in gastric cancer patients using routine clinical data.

## Contribution

The novel contribution is a validated machine learning model and a simplified decision tree using accessible biomarkers for cachexia prediction in gastric cancer.

## Key findings

- A Random Forest model achieved high accuracy (AUC = 0.913) in predicting cachexia risk.
- A simplified decision tree using CA19-9, CEA, and albumin retained diagnostic accuracy (AUC > 0.783).
- The model was validated across multiple centers and demonstrated strong generalizability.

## Abstract

Cachexia is a frequent, specific metabolic syndrome that severely compromises survival in gastric cancer (GC). While early diagnosis is paramount, existing screening methods are limited by complexity and suboptimal accuracy. There is an urgent need for an efficient, data-driven tool derived from routine clinical parameters.

In this multicenter retrospective study, we analyzed data from three independent hospitals. Variable selection was performed using univariable and multivariable analyses. We constructed and compared multiple machine learning (ML) models to predict cachexia risk. The models’ discriminative ability, calibration, and clinical net benefit were comprehensively evaluated via AUC, calibration plots, and Decision Curve Analysis (DCA).

The study included 1,570 GC patients (cachexia prevalence: 30.3%). Patients were divided into training (n=920), internal testing (n=350), and external validation (n=300) cohorts. Cachexia was significantly associated with poor nutritional status, elevated inflammation, and inferior overall survival (P < 0.01). The Random Forest (RF) model yielded the best performance, maintaining excellent stability across the internal test set (AUC = 0.898) and external validation set (AUC = 0.913). To enhance clinical utility, we further derived a simplified decision tree model based on three accessible markers: CA19-9, CEA, and albumin. This simplified tool retained high diagnostic accuracy (AUC > 0.783) and demonstrated significant positive net benefits in DCA.

We successfully established and externally validated a high-performance ML model for predicting GC-associated cachexia. Crucially, the derived simplified decision tree offers a convenient, highly generalizable tool for clinicians to identify high-risk patients using routine laboratory tests, enabling earlier precision nutritional management.

## Linked entities

- **Diseases:** gastric cancer (MONDO:0001056)

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, CEACAM3 (CEA cell adhesion molecule 3) [NCBI Gene 1084] {aka CD66D, CEA, CGM1, CGM1a, W264, W282}
- **Diseases:** inflammation (MESH:D007249), metabolic syndrome (MESH:D024821), Cachexia (MESH:D002100), GC (MESH:D013274)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13008652/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13008652/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC13008652/full.md

---
Source: https://tomesphere.com/paper/PMC13008652