Lung cancer screening with AI can discover cures for many early diseases. A public utility can make sure it happens
James L. Mulshine, Bruce S. Pyenson

TL;DR
AI-enhanced lung cancer screening using CT scans can detect early signs of other diseases, and making this data publicly accessible could accelerate medical innovation.
Contribution
The paper proposes using chest CT screening data as a public utility to overcome research bottlenecks and enable AI tool development for early disease detection.
Findings
CT scans from lung cancer screening contain valuable data for detecting other diseases like coronary artery disease.
AI tools can help manage the complex workflow of chest CT screening and improve efficiency.
Repurposing existing CT data collections through web-based strategies could support new analyses and AI development.
Abstract
Many nations around the world are now implementing CT-based lung cancer screening. Growing evidence led the United States to require insurance coverage for LCS in high-risk individuals. Current CT scanners can obtain vast amounts of anatomic and quantitative information from the viscera of the chest cavity, and it has become evident that the CT images obtained from LCS contain additional health information, including information that enables the early detection of other major tobacco-associated diseases, such as coronary artery disease and emphysema. Chest CT screening is now being integrated with the use of AI tools, and such tools will be essential to organize and manage the complex screening workflow required to efficiently deliver this rapidly expanding service. A threat to realizing the health benefits of chest CT screening is the difficulty in aggregating sufficient numbers of CT…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLung Cancer Diagnosis and Treatment · Radiomics and Machine Learning in Medical Imaging · COVID-19 diagnosis using AI
Introduction
Lung cancer remains the world’s most lethal cancer (1). Fortunately, lung cancer screening has objectively been shown to reduce lung cancer mortality outcomes in long term follow-up (2–4). Detection of early-stage lung cancer before disease symptoms generally results in curative outcomes from either minimally invasive surgery or radiotherapy with modest risk of untoward treatment side effects. Lung cancer implementation is ongoing internationally with preservation of mortality benefit (5). The pace of uptake of this screening process has not been robust and measures to increase uptake have not yet been successful (6). A concern exists that the full benefits of chest CT screening have not been communicated to either the medical community or the public. While considerable enthusiasm exists about the potential for public health benefit with the chest CT screening detection of additional, major co-morbid tobacco-related diseases, lung cancer screening may provide other significant public health benefits that have not yet been considered (7, 8).
With best-practice lung cancer screening, the chest CT imaging data as well as clinical follow-up information is preserved, which allows the detection of lung cancers that may emerge over years of sequential screening images. Recent publications have already used such image collections for further optimizing screening care and developing or validating artificial intelligence (AI) tools among other reasons (9, 10).
The potential of real-world data, such as images from LCS, to support clinical innovation is attracting attention from regulators. Former Food and Drug Administration Commissioner, Dr. Robert Califf recently called for a fundamental reconsideration of evidence sources including electronic health record information to accelerate healthcare innovation (11).
With ongoing innovations within CT technology, image quality continues to improve while medical radiation doses continue to decrease (12). This dynamic may improve LCS participation. However, even with slow uptake, global implementation of screening will result in millions of screening participants and LCS will soon generate millions of CT images. Enabling open-research access to large collections of chest CT images with associated clinical outcome data acquired from routine LCS is feasible and could accelerate and sustain the development of life-saving medical interventions (13, 14). Access to such an open research resource could empower a large community of innovators to develop and validate the reliability of relevant next generation AI tools at low cost. This resource could also be used for inexpensive and comprehensive chest CT imaging quality analysis of precision to reduce variability (15, 16). In this Perspective, the authors propose the creation of public utilities that will create and facilitate routine research access to huge LCS imaging/clinical registries.
Current status of research with LCS images
The utility of a chest CT registry is already evident. Upon completion of the NLST, 48,547 chest CT images and associated clinical outcomes data acquired through the trial, were made available for open research (17). This data as been used to estimate the potential to detect other tobacco-related diseases including atherosclerotic heart disease and emphysema (18, 19). The usual limitations of using screening trial data for population health purposes affects these data, including long time lags for curation and the selection biases associated with screening trial recruitment. In addition, rapid progression of CT technology quickly limits the value of older collections (17). The NLST images were acquired starting 2002 and typically used 4-detector scans with up to 2.5mm slice thickness. As a result, this CT image collection is now obsolete for many studies.
To optimally manage the routine ongoing care of screening subjects, all annually acquired chest CT screening images already are being stored (along with relevant clinical outcomes data) at the institution providing the screening care. We and others have reported that a cloud-based environment with appropriate permissions can interrogate that already stored imaging and clinical data at the host imaging site. This approach allows computational analyses of images from multiple sites in a secure, economical and efficient fashion (13, 14).
The value of a public utility for imaging and clinical data resource
The authors propose the creation of public utilities for screening images and associated clinical data that can be accessed and updated efficiently and economically using current cloud computing. Existing medical image and data cloud storage environments maintained with robust data security provisions mean that access to this screening participant data resource can be secure (13, 14). For several reasons including promoting the public good, the authors propose establishing a public utility model rather than defaulting to the traditional private equity model. With appropriate governance, the public utility model (public library) could also better mitigate public skepticism while ensuring equity in participation and accessibility (20, 21).
AI tool development and validation require relevant material which constitutes “ground truth.” Under a traditional approach, for imaging biomarkers, this would require that a large number of screening participants agree to provide access to their chest CT images and clinical information. Ideally, this process requires a regular flow over years of such data, since imaging and corresponding AI tools continue to rapidly evolve. Therefore, having a stable but dynamic source of imaging/clinical data is ideal for building AI tools and maintaining data provenance while accelerating tool development. However, the recruitment of participants under the traditional approach would be burdensome and expensive.
There is an urgent need for process efficiencies in the global implementation of screening. Currently, any chest CT scanner used for screening can acquire a vast amount of relevant imaging data, and the deployment of this new service could overwhelm today’s professional radiological workflow. Therefore, developing tools, especially using AI, are needed to assist radiologist with chest CT screening review.
A particular need with tool development is to assist radiologists in performing computational analysis of serial annual chest CT scans to follow the status of disease progression in a screening participant to ensure productive early detection of potentially lethal chest diseases.
Systematically monitoring strategic imaging biomarkers in a fraction of the images potentially available through an LCS public utility could provide a powerful public health tool to optimize clinical management and to systematically assess the health of nations.
Emerging clinical benefits with coronary calcium detection on screening chest CT
A number of investigators have reported that chest CT imaging to measure coronary vascular calcifications is the most informative biomarker to manage the risk of cardiac mortality, often before the development of symptoms of heart disease (22, 23). Recommended interventions—such as lifestyle and statin administration are already used to mitigate risk in this setting with evidence that chest CT imaging does a better job of assessing cardiovascular risk than measuring LDL or any other lipid biomarker (22–24). CT-screen-detected coronary artery disease is already recognized by cardiology professional societies, with guideline recommendations for management with statins, which are low-cost generics. Further improvement in screening participants’ health can occur since information obtained from LCS can detect the presence of coronary calcium which might provide additional motivation to adhere to lifestyle interventions such as with smoking cessation or increased physical activity (7, 8, 24).
Emerging clinical benefits with detection of emphysema
The ability to detect emphysema in LCS scans has already been reported based on retrospective analysis of NLST archival data (19). Asymptomatic screening subjects may benefit from more intensive smoking cessation interventions today with a range of promising pharmaceutical interventions for early lung disease moving forward (24).
Additional chest conditions detected on screening CT
Moving beyond the benefits we can achieve today, there are many serious conditions of the chest where we need better tools and predictions. Some of these are relatively common conditions, such as emphysema or atrial fibrillation, while others are rare, such as particular forms of interstitial lung disease. Today, CT scans often play a role in evaluating these conditions among high-risk or symptomatic patients (22–24). In Figure 1, a schematic developed based on the work of the International-Early Lung Cancer Action Project (I-ELCAP) highlights a number of chest diseases which can be visualized on a thoracic CT scan (24). As AI matures, we may learn to discriminate subtle patterns recognized on chest CT that will predict who is most likely to develop these or related conditions, perhaps years earlier than is currently possible. This will be possible with the proposed public utility partly because AI can identify relationships that are unexpected, but also because the public utility will allow researchers to follow large numbers of screening participants for years, which would be impractical with current, manual approaches.
Sites where disease entities can be detected with chest CT screening.
Barriers to sustained innovation with imaging
Researchers using AI or other big-data analytic approaches for tool development may need access to huge libraries of images from diverse settings and people, especially when developing tools for public health activities like screening. Currently, collecting but, today, collecting images is expensive and slow, and today, most AI tools have been trained or tested on mere hundreds of images, often from a single academic institution, and perhaps only using images from a single type of CT scanner. Another barrier to progress is the proprietary nature of most existing defined purpose biomedical image collection which may be related to high prices demanded by major institutions for their image contributions, which means that well-capitalized private firms have a huge advantage. The proprietary nature of most biomedical image collection is another barrier to progress. These collections may have narrowly-defined purposes and limited volumes, which may reflect the high prices demanded by major institutions for their image collections. These constraints mean that well-capitalized private firms have huge advantages over other researchers. After developing and launching AI tools using private image collections, private capital financed firms have little incentive to either make their image collections available for other AI development or to update their tools.
For the last several decades, the hardware side of imaging (e.g., CT scanner) has rapidly evolved, fueled by improvements in microprocessor capabilities, detector sensitivity, cloud computing and the development of AI (7, 8, 12). Since all these components interact, this constant component evolution means that the relationship between hardware and AI tools will remain dynamic, especially as the innovation cycle shortens. For example, improving imaging detail may enable even earlier detection of disease states and allow for less aggressive curative intervention which might result in an innovation that may improve a specific health outcome.
However, any change in an imaging component, such as improvements in image detectors or analysis capabilities, requires re-validation of tools to ensure the consistency of quality performance. To allow such validation cycles to be completed rapidly, routine access to appropriate image/data cases is required. Sustaining innovation in this important new screening setting drives urgency in defining approaches that provide large numbers of cases for image/data libraries. The dynamic nature of AI tool development in medical imaging will necessitate ongoing collection of new clinical, data images to sustain progress in early chest disease outcomes.
Proposed solutions
We offer a hypothetical (but realistic) example of how the public utility/AI might develop a product. Based on analyzing tens of thousands of historical cases accumulated over several years, a team of researchers might conclude that particular places in the lung, with certain patterns of change (barely visible to a radiologist), are at high risk of developing aggressive lung cancer. This finding would be published and incorporated into the image review software that radiologists use, and the software would flag images with the suspicious pattern for radiologist scrutiny. Recommended follow-up would be guided by the outcomes observed in the historical cases. Perhaps the individual will need re-scanning at 6-month intervals instead of annually, or the individual might be referred for additional workup.
Models of success with image and outcome data collection
Assembling the required volume of images associated with clinical outcomes to enable open-access research faces a number of obstacles. These include skepticism of screening participants about how their CT scans may be used, concerns over the high costs of curating the collected scans, and ensuring confidential access over the collection’s lifetime. Fortunately, there has been important progress in solving obstacles of privacy, data manipulation, procurement, and ownership.
Nonprofit research efforts have made excellent progress despite suffering from limited funds and limited access to medical images and related clinical data. Several organizations have laid the organizational and technical foundation for the public service utility model and examples include the Open-Source Imaging Consortium for Interstitial Lung Disease which has catalyzed progress with the spectrum of pulmonary fibrotic diseases (25). A growing inventory of valuable medical images is being assembled and curated by The Medical Imaging and Data Resource Center, which is working to create globally harmonized processes to support open research for medical imaging (26). Another image collection has been developed by the International-Early Lung Cancer Action Project (I-ELCAP), which is the largest and most influential of lung cancer screening organizations, with over 200,000 chest CT images collected longitudinally for more than 25 years (4, 10, 24). The leaders of these organizations have actively developed rigorous quality assurance processes, and the current image quality of the collections is excellent.
There are numerous examples of patient data sharing. The Regional Health Information Organizations (RHIOs, e.g., The Bronx Regional RHIO) are useful precedents (27). RHIOs attempt to fix healthcare fragmentation by gathering a patient’s medical and treatment information from multiple providers and assemble it to help a provider treat that patient. In some ways, our proposed Utility is simpler because its focus is research or process improvement, not providing real-time information to the provider treating the patient.
Technology progress
Organizations such as the Quantitative Imaging Biomarker Alliance (QIBA) of the Radiological Society of North America have demonstrated important technical and workflow approaches to the assemble, curate, and evaluate such large numbers of digital medical images to enable process optimization on measuring imaging biomarkers within defined constraints (28). Their publications have proven the feasibility and utility of many of the recommended acquisition parameters allowing robust computer analysis especially for measuring feature such as volume consistently. A QIBA profile providing guidance with measurement issues for chest CT lung cancer screening has been published to support more reliable nodule volume quantitation (29).
Others have contributed to the understanding of how cloud-based federated image and data collection can comply with the national and international regulations, such as the European General Data Protection Regulation (11, 13, 14).
Progress in ethical framework
A framework supporting the ethics of public ownership of images as a health imperative to enable responsible AI progress has also been developed (30). This framework supports our vision of creating a public service utility (library) to assemble, curate, and facilitate AI and other research on donated images. Larson and colleagues have also suggested an ethical framework for more nimbly sharing de-identified clinical data for AI development that frames data sharing as a public good and holds data users as data stewards with fiduciary responsibility to safeguarding the privacy of donors’ data when developing health-directed AI tools (31).
How a public utility might work
US regulations for today’s public utilities (water, electric, etc.) date back to the 19^th^ century, but there is a significant movement to update these regulations for the technological realities of the 21st century (32). We propose that a public utility could “house” a virtual collection of LCS images and associated clinical information. The actual information would continue to be stored at the screening participant care sites, as physical access in one location is not needed for relevant research. There are many ways to implement this public utility, including, for example, a regional approach (state-based in a US context).
A US state-chartered corporation could have enabling legislation that would require that providers allow the utility to access images and related data from LCS delivered in the state. We see analogues in the state Maternal Mortality Review Committees, which have operated for almost a century to track pregnancy-related deaths (33). Some states already use provider taxes to fund public health efforts (34), and this could be a viable way to fund the public utility. Requirements for data capture, quality, and participation would be imposed on providers, and rules for research access (including fees) would be established. As states establish utilities, data sharing collaboratives could be established for multi-state efficiencies (35).
Even a smaller state can generate meaningful databases. For example, Kentucky with about 4.6 million lives has about 1.25% of the US population. If, the US generates 2 million LCS scans in a year, Kentucky, on a pro-rata basis could generate about 25,000 images per year. New York, a larger state with 20 million lives, or about 5.5% of the US population, could generate over 100,000 images. With such an approach, robust image repositories could be accreted and economically maintained as with the IASLC cloud-based proof-of-concept example (14).
Figure 2 illustrates a possible state-wide cloud-based network of screening sites. Each site would have a pre-approved informatics construct to allow access to cloud delivered analysis software for already performed LCS images and associated, de-identified clinical data. In this hypothetical example, a possible hub-and-spoke arrangement is applied to Minnesota. The hub would direct AI algorithms to be performed on selected, relevant images housed in their “natural” sites. The number of screening sites could be scaled as required related to the purpose of the analysis.
Schematic of a possible state-wide cloud-based network of screening sites which allow access to cloud delivered analysis software for chest CT images and associated, de-identified metadata. Illustration of a possible hub-and-spoke arrangement applied to Minnesota. The hub would direct AI algorithms to be performed on selected, relevant images housed in their “natural” sites. Number of screening sites could be readily expanded if desired.
The proposed digital resource with curated, high-quality imaging/clinical data cases tracked over many years could be readily interrogated efficiently and economically. Patient-specific imaging or clinical data would reside in the screening center and only analysis results would be transmitted back to the cloud-hub. This model of a public service utility comprised of imaging and clinical information could be an exemplar of a 21^st^ century comprehensive medical registry, which maintains transparency in the development and validation of AI tools and objectively evaluates the quality and reliability screening technology in clinical use (13, 14, 36).
The evolution of imaging data use for public health
While chest CT images include a remarkable amount of already-known clinically relevant information, there could be more actionable information in those images. Access to the already collected images and the associated clinical follow-up information represents an opportunity to economically mine information for the benefit of public health.
Consider that the first indication of ubiquity of atherosclerotic disease emerged from systematic autopsy studies of US military fatalities from the Korean and Vietnam Wars (37, 38). Systematic computational analysis of archived images and clinical data screening chest CT cases could quickly and economically yield novel atherosclerotic information but in large and diverse cohorts of live, LCS participants.
Further, when the first news of high mortality with nesting birds related in part due to pesticide exposure emerged from England in the 1950s, there was no awareness of a comparable problem in North America. Only when a bird survey was done in North America, was a stunning depletion of these birds discovered which was thought in part related to thinning of eggshell (38). LCS imaging can readily detect changes in calcium density in many living tissues including bird shells as well as in bones. LCS images can be evaluated to provide early warning of compromised bone mineral density in already acquired chest CT images (39, 40). Potentially, systematic AI-radiomic analysis of already acquired chest CT screening images could be adapted to serve as an economical and time efficient, 21^st^ century, opportunistic warning or reconnaissance system for evolving biomedical threats.
Limitations
While the use of large imaging/ data resources has many important potential applications, there are important consideration that limit this approach. Participation in screening incurs consideration with healthy volunteer effect as well as potential for disease misclassification given the inclusion of some degree of heterogenous data quality as well as regional variations in disease pathogenesis. Further, issues with data integrity and privacy are challenges that must be continually monitored and managed with evolving best practices to maintain public confidence with the utility of providing access to screening information and images. Finally, measures to ensure equitable inclusion of images and data from all segments of society are essential to ensure that validated tools can perform reliably as the public expects with a true public health resource.
Conclusion
LCS images include a remarkable amount of currently useful, clinically relevant information but there is likely even more information of clinical and operational value that is not being used. The imminent availability of millions of LCS images and associated clinical information creates logistical challenges, but the information has huge value. We offer a novel, public utility approach that will accelerate both operational and clinical breakthroughs efficiently and at sustainable, low cost.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bray F Laversanne M Sung H Ferlay J Siegal RL Soerjomataram I . Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2024) 74:229–63. doi: 10.3322/caac.21834, PMID: 38572751 · doi ↗ · pubmed ↗
- 2National Lung Screening Trial Research Team . Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. (2011) 365:395–409. doi: 10.1056/NEJ Moa 1102873, PMID: 21714641 PMC 4356534 · doi ↗ · pubmed ↗
- 3de Koning HJ van der Aalst CM de Jong PA Scholten ET Nackaerts K Heuvelmans MA . Reduced lung-cancer mortality with volume CT screening in a randomized trial. N Engl J Med. (2020) 382:503–13. doi: 10.1056/NEJ Moa 1911793, PMID: 31995683 · doi ↗ · pubmed ↗
- 4Henschke CI Yip R Shaham D Markowitz S Cervera Deval J Zulueta JJ . A 20-year follow-up of the international early lung cancer action program (I-ELCAP). Radiology. (2023) 309:e 231988. doi: 10.1148/radiol.231988, PMID: 37934099 PMC 10698500 · doi ↗ · pubmed ↗
- 5Bonney A Malouf R Marchal C Manners D Fong KM Marshall HM . Impact of low-dose computed tomography (LDCT) screening on lung cancer-related mortality. Cochrane Database Syst Rev. (2022) 2022. doi: 10.1002/14651858.CD 013829.pub 2/pdf/full PMC 934766335921047 · doi ↗ · pubmed ↗
- 6Tao W Tang X Jiao X Sayani A Zhao J Li W . Effectiveness of interventions for increasing lung cancer screening uptake: A systematic review and meta-analysis of randomized clinical trials. Prev Med. (2026) 203:108489. doi: 10.1016/j.ypmed.2025.108489, PMID: 41448280 · doi ↗ · pubmed ↗
- 7Mulshine JL Pyenson B Healton C Aldige C Avila RS Blum T . Paradigm shift in early detection: Lung cancer screening to comprehensive CT screening. Eur J Cancer. (2025) 218:115264. doi: 10.1016/j.ejca.2025.115264, PMID: 39904127 · doi ↗ · pubmed ↗
- 8Yip R Mulshine JL Oudkerk M Field J Silva M Yankelevitz DF . Current evidence of low-dose CT screening benefit. Eur J Cancer. (2025) 225:115570. doi: 10.1016/j.ejca.2025.115570, PMID: 40517528 · doi ↗ · pubmed ↗
