# NSCLC EGFR Mutation Prediction via Random Forest Model: A Clinical–CT–Radiomics Integration Approach

**Authors:** Anass Benfares, Badreddine Alami, Sara Boukansa, Mamoun Qjidaa, Ikram Benomar, Mounia Serraj, Ahmed Lakhssassi, Mohammed Ouazzani Jamil, Mustapha Maaroufi, Hassan Qjidaa

PMC · DOI: 10.3390/arm93050039 · Advances in Respiratory Medicine · 2025-09-26

## TL;DR

A machine learning model combining clinical, CT, and radiomic data can accurately predict EGFR mutation status in lung cancer patients, aiding early treatment decisions.

## Contribution

A novel Random Forest model integrating clinical, CT, and radiomic features achieves high accuracy in predicting EGFR mutation status in NSCLC.

## Key findings

- The best-performing Random Forest model achieved an AUC of 0.91 in predicting EGFR mutation status.
- Key predictors identified include tobacco use, enhancement pattern, and gray-level-zone entropy.
- The model showed balanced performance with F1-scores of 0.91 for EGFR-WT and 0.68 for EGFR-Mutant.

## Abstract

What are the main findings?
Accurate estimation of epidermal growth factor receptor (EGFR) mutation status in NSCLC patients can be achieved through a predictive framework combining clinical, CT, and radiomic information.The best-performing Random Forest model (11 features) achieved an AUC of 0.91 (95% CI: 0.81–1.00). Subgroup results were EGFR-WT (F1-score = 0.91 ± 0.02) and EGFR-Mutant (F1-score = 0.68 ± 0.04), confirming balanced though differentiated predictive performance.

Accurate estimation of epidermal growth factor receptor (EGFR) mutation status in NSCLC patients can be achieved through a predictive framework combining clinical, CT, and radiomic information.

The best-performing Random Forest model (11 features) achieved an AUC of 0.91 (95% CI: 0.81–1.00). Subgroup results were EGFR-WT (F1-score = 0.91 ± 0.02) and EGFR-Mutant (F1-score = 0.68 ± 0.04), confirming balanced though differentiated predictive performance.

What is the implication of the main finding?
The proposed non-invasive prediction tool may assist in early identification of candidates for tyrosine kinase inhibitor (TKI) therapy when tissue sampling is limited.This integrative approach supports the development of AI-driven, personalized diagnostic strategies in lung cancer management.

The proposed non-invasive prediction tool may assist in early identification of candidates for tyrosine kinase inhibitor (TKI) therapy when tissue sampling is limited.

This integrative approach supports the development of AI-driven, personalized diagnostic strategies in lung cancer management.

Non-small cell lung cancer (NSCLC) is the leading cause of cancer-related mortality worldwide. Accurate determination of epidermal growth factor receptor (EGFR) mutation status is essential for selecting patients eligible for tyrosine kinase inhibitors (TKIs). However, invasive genotyping is often limited by tissue accessibility and sample quality. This study presents a non-invasive machine learning model combining clinical data, CT morphological features, and radiomic descriptors to predict EGFR mutation status. A retrospective cohort of 138 patients with confirmed EGFR status and pre-treatment CT scans was analyzed. Radiomic features were extracted with PyRadiomics, and feature selection applied mutual information, Spearman correlation, and wrapper-based methods. Five Random Forest models were trained with different feature sets. The best-performing model, based on 11 selected variables, achieved an AUC of 0.91 (95% CI: 0.81–1.00) under stratified five-fold cross-validation, with an accuracy of 0.88 ± 0.03. Subgroup analysis showed that EGFR-WT had a performance of precision 0.93 ± 0.04, recall 0.92 ± 0.03, F1-score 0.91 ± 0.02, and EGFR-Mutant had a performance of precision 0.76 ± 0.05, recall 0.71 ± 0.05, F1-score 0.68 ± 0.04. SHapley Additive exPlanations (SHAP) analysis identified tobacco use, enhancement pattern, and gray-level-zone entropy as key predictors. Decision curve analysis confirmed clinical utility, supporting its role as a non-invasive tool for EGFR-screening.

## Linked entities

- **Genes:** EGFR (epidermal growth factor receptor) [NCBI Gene 1956]
- **Diseases:** Non-small cell lung cancer (MONDO:0005233), NSCLC (MONDO:0005233)

## Full-text entities

- **Genes:** EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}
- **Diseases:** NSCLC (MESH:D002289), cancer (MESH:D009369)
- **Species:** Nicotiana tabacum (American tobacco, species) [taxon 4097], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12562246/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12562246/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/PMC12562246/full.md

---
Source: https://tomesphere.com/paper/PMC12562246