# Development and validation of machine learning models for predicting STAS in stage I lung adenocarcinoma with part-solid and solid nodules: a two-center study

**Authors:** Qing-Lin Ren, Liu Lin, Kai Chu, Xin-Rong Xu, Hui-Jun Wang, Jun Wu, Jin-Zhi You, Jun-Xi Hu, Xiao-Lin Wang, Yu-Sheng Shu

PMC · DOI: 10.3389/fonc.2025.1682633 · Frontiers in Oncology · 2025-10-29

## TL;DR

This study develops and validates machine learning models to predict STAS in stage I lung adenocarcinoma, helping guide surgical decisions and patient counseling.

## Contribution

A novel XGBoost-based machine learning model is developed and validated for preoperative prediction of STAS in lung adenocarcinoma.

## Key findings

- The XGBoost model achieved an AUC of 0.889 in training and 0.856 in validation for predicting STAS.
- Calibration curves showed good agreement between model predictions and actual observations.
- SHAP analysis identified key predictors like CEA, vascular convergence, and proGRP as important for STAS prediction.

## Abstract

This study aimed to preoperatively predict spread through air spaces (STAS) in stage I lung adenocarcinoma presenting as part-solid and solid nodules by leveraging clinical features and machine learning models, thereby guiding surgical decision-making and enhancing patient counseling.

A total of 473 patients were retrospectively enrolled, including 353 from our center and 120 from an validation cohort. Predictive features were selected using maximum relevance minimum redundancy (mRMR) and least absolute shrinkage and selection operator (LASSO) algorithms. Seven machine learning models—logistic regression, random forest, support vector machine (SVM), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), light gradient boosting machine (LightGBM), and category boosting (CatBoost)—were developed and evaluated using receiver operating characteristic curves, calibration plots, and decision curve analysis (DCA). Feature importance was assessed using Shapley Additive Explanations (SHAP). A web-based nomogram was constructed for clinical application.

STAS was present in 44.76% of the training set and 50.83% of the validation cohort. Seven predictors were selected to construct the predictive models. The XGBoost model demonstrated superior performance with an AUC of 0.889 (95% CI, 0.852–0.926) in training and 0.856 (95% CI, 0.789–0.928) in validation. The calibration curves in training and validation set exhibited good agreement between the predictions and actual observations. The Decision Curve Analyses (DCA) provide significant clinical utility. SHAP analysis identified the most important predictors for STAS as CEA, vascular convergence, proGRP, age, AFP, smoking history, and CTR.

The XGBoost model provides robust preoperative prediction of STAS and may assist clinicians in optimizing surgical strategies for patients with stage I lung adenocarcinoma.

## Linked entities

- **Diseases:** lung adenocarcinoma (MONDO:0005061)

## Full-text entities

- **Genes:** AFP (alpha fetoprotein) [NCBI Gene 174] {aka AFPD, FETA, HPAFP}
- **Diseases:** stage I lung adenocarcinoma (MESH:D000077192)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12605206/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12605206/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12605206/full.md

---
Source: https://tomesphere.com/paper/PMC12605206