Reproducible Benchmarking for Lung Nodule Detection and Malignancy Classification Across Multiple Low-Dose CT Datasets
Fakrul Islam Tushar, Avivah Wang, Lavsen Dahal, Ehsan Samei, Michael R. Harowicz, Jayashree Kalpathy-Cramer, Kyle J. Lafata, Tina D. Tailor, Cynthia Rudin, Joseph Y. Lo

TL;DR
This paper introduces a reproducible, multi-dataset benchmark for lung nodule detection and malignancy classification in low-dose CT scans, highlighting dataset influence on AI model performance and promoting transparent evaluation.
Contribution
It establishes a public benchmark across multiple datasets and evaluates various training strategies, emphasizing the importance of dataset characteristics in AI performance for lung cancer detection.
Findings
Detection models trained on clinically curated datasets perform better externally.
Pretraining strategies like Strategic Warm-Start improve malignancy classification.
Dataset heterogeneity significantly impacts AI model generalizability.
Abstract
Evaluation of artificial intelligence (AI) models for low-dose CT lung cancer screening is limited by heterogeneous datasets, annotation standards, and evaluation protocols, making performance difficult to compare and translate across clinical settings. We establish a public, reproducible multi-dataset benchmark for lung nodule detection and nodule-level cancer classification and quantify cross-dataset generalizability. Using the Duke Lung Cancer Screening (DLCS) dataset as a clinically curated development set, we evaluate performance across LUNA16/LIDC-IDRI, NLST-3D, and LUNA25. Detection models trained on DLCS and LUNA16 were evaluated externally on NLST-3D using free-response ROC analysis. For malignancy classification, we compared five strategies: randomly initialized ResNet50, Models Genesis, Med3D, a Foundation Model for Cancer Biomarkers, and a Strategic Warm-Start…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Advanced X-ray and CT Imaging · Lung Cancer Diagnosis and Treatment
Methods3 Dimensional Convolutional Neural Network
