# A Novel Automated Algorithm to Identify Lung Cancer Screening from Free Text of Radiology Orders

**Authors:** Alison S. Rustagi, Marzieh Vali, Francis J. Graham, Emily N. Lum, Christopher G. Slatore, Salomeh Keyhani

PMC · DOI: 10.1007/s11606-025-09429-2 · 2025-02-25

## TL;DR

A new algorithm accurately identifies lung cancer screening scans by analyzing radiology order text, outperforming traditional administrative codes.

## Contribution

A novel automated algorithm that improves accuracy in identifying lung cancer screening scans compared to administrative codes.

## Key findings

- The algorithm achieved 97% sensitivity and 79% specificity in identifying lung cancer screening scans.
- Only 69% of scans classified as screening via administrative codes were truly screening, compared to 95% via the algorithm.
- The algorithm's performance was consistent across different populations regardless of smoking history.

## Abstract

Lung cancer screening (LCS) is recommended for asymptomatic patients. Administrative codes for LCS may capture tests prompted by signs/symptoms.

To validate an automated algorithm that identifies LCS among asymptomatic patients.

In this cross-sectional study, an algorithm was iteratively developed to identify outpatient low-dose chest CT scans via Current Procedural Terminology (CPT) codes, search free text of radiology orders for screening terms and signs/symptoms (e.g., cough), and classify scans as screening or not.

National population-based sample of 4503 adults ages 65–80 in Veterans Health Affairs primary care, with detailed smoking history to identify LCS-eligible individuals (30 + pack-years, current tobacco use, or quit < 15 years prior).

Algorithm specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV) relative to manual chart review (gold standard) on 100% of screening scans and > 10% random sample of non-screening scans.

Chart review was conducted on n = 335 scans. The final algorithm could not classify 22% of scans, of which 73% were non-screening; these were excluded from primary analyses. Among 842 LCS-eligible individuals, the algorithm demonstrated 97% sensitivity (95%CI 91–99%) and 79% specificity (58–93%). Only 69% (61–77%) of scans classified as LCS via administrative codes were truly screening, compared to 95% of those classified as screening via the algorithm (p < 0.001). Algorithm performance was similar regardless of LCS eligibility, with 90% PPV (84–94%) and 93% NPV (86–97%) in the overall population regardless of tobacco cigarette history.

An automated algorithm can accurately identify screening versus diagnostic chest imaging, a necessary step to unbiased analyses of LCS in non-randomized settings. Studies should assess the accuracy of administrative codes for LCS in other health systems.

The online version contains supplementary material available at 10.1007/s11606-025-09429-2.

## Linked entities

- **Diseases:** lung cancer (MONDO:0005138)

## Full-text entities

- **Diseases:** Lung Cancer (MESH:D008175), cough (MESH:D003371)
- **Species:** Homo sapiens (human, species) [taxon 9606], Nicotiana tabacum (American tobacco, species) [taxon 4097]

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12045916/full.md

---
Source: https://tomesphere.com/paper/PMC12045916