Development and external validation of a lung cancer risk estimation tool using gradient-boosting
Pierre-Louis Benveniste, Julie Alberge, Lei Xing, Jean-Emmanuel, Bibault

TL;DR
This study develops and validates a gradient-boosting machine learning tool using large datasets to predict five-year lung cancer risk, aiding early detection and personalized screening decisions.
Contribution
The paper introduces a novel ML-based risk estimation tool validated on external data, improving lung cancer prediction accuracy over existing guidelines.
Findings
Model achieved ROC-AUC of 82% on PLCO and 70% on NLST datasets.
The tool's precision was higher than USPSTF guidelines at similar recall levels.
A web application was developed for individual risk assessment.
Abstract
Lung cancer is a significant cause of mortality worldwide, emphasizing the importance of early detection for improved survival rates. In this study, we propose a machine learning (ML) tool trained on data from the PLCO Cancer Screening Trial and validated on the NLST to estimate the likelihood of lung cancer occurrence within five years. The study utilized two datasets, the PLCO (n=55,161) and NLST (n=48,595), consisting of comprehensive information on risk factors, clinical measurements, and outcomes related to lung cancer. Data preprocessing involved removing patients who were not current or former smokers and those who had died of causes unrelated to lung cancer. Additionally, a focus was placed on mitigating bias caused by censored data. Feature selection, hyper-parameter optimization, and model calibration were performed using XGBoost, an ensemble learning algorithm that combines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLung Cancer Diagnosis and Treatment · Radiomics and Machine Learning in Medical Imaging
MethodsFocus
