# From biomarker to clinical utility: translating the advanced lung cancer inflammation index into a machine learning-driven risk stratification tool for colorectal cancer

**Authors:** Ming Gao, Ying Li, Huimei Wang, Jinming Zhang, Guangxun Zhang, Nan Zhang

PMC · DOI: 10.1186/s12967-025-07494-z · Journal of Translational Medicine · 2025-11-26

## TL;DR

This study shows that better nutrition and lower inflammation, measured by ALI, are linked to lower colorectal cancer risk and can be used in a machine learning model for risk prediction.

## Contribution

The study translates the ALI biomarker into a machine learning-based risk stratification tool for colorectal cancer.

## Key findings

- Each unit increase in log-ALI was associated with a 20.9% reduction in CRC risk.
- The LightGBM machine learning model achieved an AUC of 0.870 for CRC prediction.
- SHAP analysis confirmed log-ALI as the most important protective feature in the model.

## Abstract

Both nutrition and inflammation have been implicated in the pathogenesis of colorectal cancer (CRC), but most previous studies have examined these factors separately. This study aimed to explore the combined association of inflammation and nutritional status with CRC.

This study selected 101,316 subjects from the National Health and Nutrition Survey (NHANES) conducted from 1999 to 2018. First, weighted logistic regression was used to measure the association between the advanced lung cancer inflammation index (ALI) and CRC. Then, restricted cubic splines (RCS) were used to capture the dose-response curve, and the predictive power of the model was calibrated by the ROC curve. Subsequently, robustness was verified through subgroup and interaction analyses. Furthermore, random forest analysis combined with the Boruta algorithm was employed to identify CRC-related factors. Subsequently, a machine learning(ML) prediction framework is constructed, and the black box of the optimal model is disassembled using SHAP values to endow it with interpretability.

In the fully adjusted model, each unit increase in log-transformed ALI was associated with a 20.9% reduction in CRC risk (OR = 0.791; 95% CI: 0.628–0.997; p = 0.047). Participants in the highest log-ALI quartile had a 46.2% lower risk compared to those in the lowest quartile (OR = 0.538; 95% CI: 0.344–0.842; P = 0.007). The fully adjusted model demonstrated strong discriminative ability (AUC = 0.848). RCS analysis confirmed a linear dose-response relationship (P for nonlinearity = 0.731). The robustness of these findings was supported by subgroup and sensitivity analyses. Random forest analysis coupled with the Boruta algorithm identified log-ALI as a strong predictor. Among seven machine learning models evaluated, the LightGBM algorithm achieved the highest and most stable predictive performance (AUC = 0.870). SHAP analysis confirmed log-ALI as the most important protective feature.

This study demonstrates that higher ALI levels, indicative of better nutritional and inflammatory status, are significantly associated with a lower risk of CRC. The optimized ML model based on ALI shows promise as a cost-effective tool for CRC risk stratification.

The online version contains supplementary material available at 10.1186/s12967-025-07494-z.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575)

## Full-text entities

- **Diseases:** colorectal cancer (MESH:D015179), lung cancer (MESH:D008175), inflammation (MESH:D007249)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12764094/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12764094/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC12764094/full.md

---
Source: https://tomesphere.com/paper/PMC12764094