# PRIME: an interpretable artificial intelligence model based on liquid biopsy improves prediction of progression risk in non-small cell lung cancer

**Authors:** Yu Wang, Yong-Bo Xiang, Xiao-Wei Chen, Tao Zhang, Jian-Yang Wang, Wen-Yang Liu, Lei Deng, Lu-Hua Wang, Shu-Geng Gao, Nan Bi

PMC · DOI: 10.1186/s40779-025-00679-z · Military Medical Research · 2026-01-06

## TL;DR

PRIME is an interpretable AI model that uses liquid biopsy data to better predict the risk of cancer progression in non-small cell lung cancer patients.

## Contribution

PRIME integrates clinical-genomic predictors with machine learning to improve progression risk prediction and enable personalized therapy decisions.

## Key findings

- PRIME outperformed single biomarkers and clinical signatures in predicting treatment failure risk (AUC = 0.82 in validation).
- MRD, treatment modality, and pre-treatment ctDNA were top contributors to model predictions.
- KEAP1, STK11, and CDKN2A mutations were confirmed as poor prognostic markers with immune-related mechanisms.

## Abstract

Despite the predictive impact of circulating tumor DNA (ctDNA) minimal residual disease (MRD), accurate prediction of failure risk after curative-intent treatments for early-stage or localized non-small cell lung cancer (NSCLC) patients to guide personalized therapy remains challenging. This study aimed to develop and validate an interpretable artificial intelligence-assisted model using global data resources.

Liquid biopsy data, blood-based genomic alterations, clinicopathological features, and survival outcomes of stage I–III NSCLC patients who underwent surgery or definitive chemoradiotherapy were collected from 6 cohorts. PRIME (Progression Risk prediction by Interpretable Machine learning on ctDNA-MRD, Mutations, and clinical-therapeutic features) was trained by 6 machine learning algorithms across 4 cohorts and validated in 2 independent cohorts. Model performance was evaluated by the area under the curve (AUC) and interpreted by SHapley Additive exPlanations (SHAP). Whole-exome sequencing (WES) or whole-genome sequencing (WGS) of tumor tissue from 430 stage II–III NSCLC patients and RNA-sequencing (RNA-seq) data from 1149 subjects, sourced from The Cancer Genome Atlas, were used to validate the prognostic effect of mutations identified in peripheral blood and investigate the underlying mechanisms.

A global dataset encompassing 781 blood samples from 493 patients was analyzed. Clinical stage, pre-treatment ctDNA, post-treatment MRD, blood-based Kelch-like ECH-associated protein 1 (KEAP1), serine/threonine kinase 11 (STK11), and cyclin-dependent kinase inhibitor 2A (CDKN2A) mutations, and treatment modality were significantly associated with the risk of disease progression and were thereby included in the model training. WES/WGS and RNA-seq confirmed the poor prognostic effect of KEAP1, STK11, and CDKN2A mutations, which were characterized by the suppressive tumor microenvironment and attenuated humoral immunity. The neural network (NN) model exhibited optimal prediction of treatment failure risk in the training (AUC = 0.85, 95% CI 0.81–0.89) and validation sets (AUC = 0.82, 95% CI 0.74–0.89). SHAP analysis indicated that MRD (+0.306), treatment modality (+0.128), and pre-treatment ctDNA (+0.043) ranked in the top 3 contributions. NN-PRIME outperformed single liquid biopsy biomarkers and clinical-therapeutic signatures, and demonstrated consistent robustness across different clinical scenarios. High-risk patients identified by NN-PRIME had poorer prognoses but derived significant benefits from adjuvant therapy after surgery.

As an interpretable model integrating readily-accessible and crucial clinical-genomic predictors, PRIME achieves enhanced performance, allowing for early outcome prediction, refined risk stratification, and personalized clinical decision-making.

The online version contains supplementary material available at 10.1186/s40779-025-00679-z.

## Linked entities

- **Genes:** KEAP1 (kelch like ECH associated protein 1) [NCBI Gene 9817], STK11 (serine/threonine kinase 11) [NCBI Gene 6794], CDKN2A (cyclin dependent kinase inhibitor 2A) [NCBI Gene 1029]
- **Diseases:** non-small cell lung cancer (MONDO:0005233)

## Full-text entities

- **Genes:** STK11 (serine/threonine kinase 11) [NCBI Gene 6794] {aka LKB1, PJS, hLKB1}, KEAP1 (kelch like ECH associated protein 1) [NCBI Gene 9817] {aka INrf2, KLHL19}, CDKN2A (cyclin dependent kinase inhibitor 2A) [NCBI Gene 1029] {aka ARF, CAI2, CDK4I, CDKN2, CMM2, INK4}
- **Diseases:** stage I-III (MESH:D062706), Cancer (MESH:D009369), NSCLC (MESH:D002289)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12771999/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12771999/full.md

---
Source: https://tomesphere.com/paper/PMC12771999