Beyond Benchmarks: A Framework for Post Deployment Validation of CT Lung Nodule Detection AI
Daniel Soliman

TL;DR
This paper introduces a physics-guided framework for evaluating the robustness of lung nodule detection AI models to variations in CT scan parameters, highlighting the impact of slice thickness on performance.
Contribution
The study presents a reproducible, resource-efficient framework for post-deployment validation of AI in clinical CT imaging, emphasizing the importance of slice thickness over noise.
Findings
Slice thickness significantly reduces detection sensitivity.
Dose reduction causes only slight performance degradation.
Heterogeneous detection performance observed across cases.
Abstract
Background: Artificial intelligence (AI) assisted lung nodule detection systems are increasingly deployed in clinical settings without site-specific validation. Performance reported under benchmark conditions may not reflect real-world behavior when acquisition parameters differ from training data. Purpose: To propose and demonstrate a physics-guided framework for evaluating the sensitivity of a deployed lung nodule detection model to systematic variation in CT acquisition parameters. Methods: Twenty-one cases from the publicly available LIDC-IDRI dataset were evaluated using a MONAI RetinaNet model pretrained on LUNA16 (fold 0, no fine-tuning). Five imaging conditions were tested: baseline, 25% dose reduction, 50% dose reduction, 3 mm slice thickness, and 5 mm slice thickness. Dose reduction was simulated via image-domain Gaussian noise; slice thickness via moving average along the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
