# Exploration of the Predictive Value of Peripheral Blood-related Indicators for EGFR Mutations and Prognosis in Non-small Cell Lung Cancer Using Machine Learning

**Authors:** Shulei FU, Shaodi WEN, Jiaqiang ZHANG, Xiaoyue DU, Ru LI, Bo SHEN

PMC · DOI: 10.3779/j.issn.1009-3419.2025.102.05 · Chinese Journal of Lung Cancer · 2025-03-21

## TL;DR

This study uses machine learning to predict EGFR mutations in non-small cell lung cancer using blood markers, offering a non-invasive alternative to genetic testing.

## Contribution

A novel interpretable machine learning model identifies key blood indicators for predicting EGFR mutations and prognosis in NSCLC.

## Key findings

- The top 10 blood indicators include pathological type, phosphorus, eosinophils, and others with an AUC of 0.80.
- Low sodium levels and squamous cell carcinoma pathology are linked to worse prognosis (P<0.05).
- The model provides a scientific basis for diagnosing and treating patients without genetic testing.

## Abstract

表皮生长因子受体（epidermal growth factor receptor, EGFR）敏感突变是非小细胞肺癌（non-small cell lung cancer, NSCLC）靶向治疗的有效靶点之一。然而，由于部分原发组织难以获取及部分经济欠发达地区经济因素，部分患者无法进行传统基因检测。本研究旨在利用非侵入性的外周血指标，建立机器学习（machine learning, ML）模型，探索NSCLC中与EGFR突变状态密切相关的生物标志物，并评估其在预后中的潜在价值。

回顾性地收集2016年11月至2023年5月就诊于江苏省肿瘤医院的2642例肺癌患者的临床指标，将有完整随访数据的175例NSCLC患者纳入研究。根据外周血指标构建ML模型，按照8:2的比例分为训练集和测试集。采用无监督学习算法对血液特征进行聚类，使用互信息法进行特征筛选，并设计基于Shapley值的集成学习算法，计算每个特征对于模型预测结果的贡献程度。使用受试者工作特征（receiver operating characteristic, ROC）曲线对模型预测能力进行评估。

通过基于Shapley值的可解释ML模型的特征提取和对预测结果的贡献度分析，筛选出前10个贡献度最高的指标，分别为：病理类型、磷、嗜酸性粒细胞、单核细胞计数、活化部分凝血活酶时间、钾、总胆红素、钠、嗜酸性粒细胞百分比及总胆固醇。本研究模型的曲线下面积（area under the curve, AUC）为0.80。此外，低血钠及病理类型为鳞癌组的患者预后较差（P<0.05）。

本研究构建的可解释的模型为NSCLC患者EGFR突变状态的预测提供了新方法，这对无法进行基因检测的患者的诊疗提供了较为科学的依据。

Characteristics of NSCLC patients in different groups defined by EGFR mutated status

The top ten features with the highest contribution and their AUC values determined by machine learning

The hazard ratios of the top 10 contributing features and EGFR mutation status in early-stage lung cancer patients

## Linked entities

- **Genes:** EGFR (epidermal growth factor receptor) [NCBI Gene 1956]
- **Diseases:** non-small cell lung cancer (MONDO:0005233), lung cancer (MONDO:0005138)

## Full-text entities

- **Genes:** EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}
- **Diseases:** Cancer (MESH:D009369), NSCLC (MESH:D002289), squamous cell carcinoma (MESH:D002294), lung cancer (MESH:D008175), hyponatremia (MESH:D007010)
- **Chemicals:** potassium (MESH:D011188), cholesterol (MESH:D002784), phosphorus (MESH:D010758), bilirubin (MESH:D001663)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11931235/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11931235/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/PMC11931235/full.md

---
Source: https://tomesphere.com/paper/PMC11931235