# Exploratory Algorithms to Aid in Risk of Malignancy Prediction for Indeterminate Pulmonary Nodules

**Authors:** Laurel Jackson, Claire Auger, Nicolette Jeanblanc, Christopher Jacobson, Kinnari Pandya, Susan Gawel, Hita Moudgalya, Akanksha Sharma, Christopher W. Seder, Michael J. Liptay, Ramya Gaddikeri, Nicole M. Geissen, Palmi Shah, Jeffrey A. Borgia, Gerard J. Davis

PMC · DOI: 10.3390/cancers17071231 · Cancers · 2025-04-05

## TL;DR

This study develops a machine learning model combining clinical data and biomarkers to better predict if lung nodules are cancerous, potentially reducing diagnostic delays.

## Contribution

The novel contribution is a machine learning model integrating clinical, radiographic, and biomarker data to predict malignancy in indeterminate pulmonary nodules.

## Key findings

- The model achieved an AUC of 0.872 in training and 0.842 in testing, outperforming the Mayo Score model.
- Key predictors included age, lesion size, pack-years, and biomarkers like hs-CRP, NSE, and CA-125.
- The model could reduce the need for repeated imaging and diagnostic delays in patients with indeterminate nodules.

## Abstract

Annual low-dose CT-based lung cancer screening has been demonstrated to reduce patient mortality relative to chest X-rays, but has significant challenges in distinguishing malignant from non-malignant radiographic findings. To address this clinical need, we applied machine learning to readily accessible clinical information (i.e., patient demographics, clinical characteristics, and radiographic parameters) along with measurements from common circulating biomarkers to develop a method to aid providers with clinical decision making and reduce diagnostic delay inherent to current treatment standards.

Background/Objectives: Lung cancer screening can reduce patient mortality. Multiple issues persist including timely management of patients with a radiologically defined indeterminate pulmonary nodule (IPN), which carries unknown pathological significance. This pilot study focused on combining demographic, clinical, radiographic, and common circulating biomarkers for their ability to aid in IPN risk of malignancy prediction. Methods: A case-control cohort consisting of 379 patients with IPNs (251 stage I lung tumors and 128 nonmalignant nodules) was used for this effort, divided into training (70%) and testing (30%) sets. Demographic variables (age, sex, race, ethnicity), radiographic information (nodule size and location), smoking pack-years, and plasma biomarker levels of CA-125, SCC, CEA, HE4, ProGRP, NSE, Cyfra 21-1, IL-6, PlGF, sFlt-1, hs-CRP, Ferritin, IgG, IgE, IgM, IgA, and Kappa and Lambda Free Light Chains were assessed for this purpose. Results: Multivariable analyses of biomarker, demographic, and radiographic variables yielded a model consisting of age, lesion size, pack-years, history of extrathoracic cancer, upper lobe location, spiculation, hs-CRP, NSE, Ferritin, and CA-125 (AUC = 0.872 in training, 0.842 in testing) with superior performance over the Mayo Score model, which consists of age, lesion size, history of smoking, history of extrathoracic cancer, upper lobe location, and spiculation (AUC = 0.816 in training, 0.787 in testing). Conclusions: In conclusion, a simple reduced algorithm consisting of biomarkers, clinical information, and demographic variables may have value for malignancy prediction of screen-detected IPNs. Upon further validation, this method stands to reduce the need for serial radiographic studies and the risks of diagnostic delay.

## Linked entities

- **Proteins:** MUC16 (mucin 16, cell surface associated), SERPINB3 (serpin family B member 3), CEACAM5 (CEA cell adhesion molecule 5), WFDC2 (WAP four-disulfide core domain 2), GRP (gastrin releasing peptide), ENO2 (enolase 2), IL6 (interleukin 6), PGF (placental growth factor), Flt1 (FMS-like tyrosine kinase 1), ferritin (soma ferritin-like), IGG (Immunoglobulin G level), IGHE (immunoglobulin heavy constant epsilon), CD40LG (CD40 ligand), CD79A (CD79a molecule), Igk (immunoglobulin kappa chain complex)
- **Diseases:** lung cancer (MONDO:0005138)

## Full-text entities

- **Genes:** GRP (gastrin releasing peptide) [NCBI Gene 2922] {aka BN, GRP-10, preproGRP, proGRP}, MUC16 (mucin 16, cell surface associated) [NCBI Gene 94025] {aka CA125}, LOC102723407 (immunoglobulin heavy variable 4-38-2-like) [NCBI Gene 102723407] {aka IGHV4, IGHV4-30, IGHV4-38-2, IGHV4-39, IGHV4-b, IGVH4-39}, IL6 (interleukin 6) [NCBI Gene 3569] {aka BSF-2, BSF2, CDF, HGF, HSF, IFN-beta-2}, IGHE (immunoglobulin heavy constant epsilon) [NCBI Gene 3497] {aka IgE}, ENO2 (enolase 2) [NCBI Gene 2026] {aka HEL-S-279, NSE}, CEACAM3 (CEA cell adhesion molecule 3) [NCBI Gene 1084] {aka CD66D, CEA, CGM1, CGM1a, W264, W282}, WFDC2 (WAP four-disulfide core domain 2) [NCBI Gene 10406] {aka BENP, EDDM4, HE4, WAP5, dJ461P17.6}, PGF (placental growth factor) [NCBI Gene 5228] {aka D12S1900, PGFL, PIGF, PLGF, PlGF-2, SHGC-10760}
- **Diseases:** Lung cancer (MESH:D008175), IPN (MESH:D055613), Malignancy (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11988104/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11988104/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/PMC11988104/full.md

---
Source: https://tomesphere.com/paper/PMC11988104