# Translational impact of machine learning-driven predictive modeling with pathway-based plasma metabolomic biomarkers for lung cancer detection

**Authors:** Eyad Himdiat, Jean-François Haince, Rashid A. Bux, Guoyu Huang, Paramjit S. Tappia, Bram Ramjiawan, Maria Vaida

PMC · DOI: 10.3389/fonc.2025.1718863 · 2026-01-22

## TL;DR

A machine learning model using plasma metabolites and pathways achieves high accuracy in detecting lung cancer, offering a promising noninvasive screening method.

## Contribution

A novel pathway-informed machine learning pipeline for lung cancer detection using plasma metabolomic data is developed and validated.

## Key findings

- A machine learning model using 41 predictors achieved 97% accuracy and a ROC AUC of 0.97 in lung cancer detection.
- Glutaminolysis and tryptophan metabolism pathways provided the most significant biological indicators for the model.

## Abstract

The detection of lung cancer at its early stages remains essential for better survival outcomes, but current diagnostic approaches show limited sensitivity and often suffer from poor generalizability and a lack of interpretability.

This retrospective study develops a machine-learning pipeline that integrates plasma metabolite measurements with pathways to derive a pathway-informed biomarker panel for lung cancer screening.

Using 800 plasma samples from the Cooperative Human Tissue Network biobank (586 cancer, 214 controls) with 166 metabolites and 60 derived pathways, we identified a subset of 41 predictors (9 pathways, 26 metabolites, 6 demographic variables) through an ensemble selection framework. Several models were tested with the Support Vector Machines (SVM) model, achieving the best results. The model delivered an overall 97% accuracy with a ROC AUC of 0.97 on this subset. After eliminating pathway-related metabolites from the initial dataset, feature selection reduced the number of variables from 170 to 41, retaining biological relevance and minimizing overfitting. The glutaminolysis and tryptophan metabolism pathway analysis yielded the most enhanced biological indicators.

This noninvasive, interpretable approach using plasma panel could facilitate cost-effective, early-stage lung cancer screening for at high-risk population cohort, with strong translational potential in clinical settings. Future work should focus on multi-center validation, prospective validation, assessing potential longitudinal biomarker stability, and integration with other omics data to further advance precision oncology, ultimately improving early detection and patient outcomes in lung cancer management.

## Linked entities

- **Diseases:** lung cancer (MONDO:0005138)

## Full-text entities

- **Diseases:** cancer (MESH:D009369), lung cancer (MESH:D008175)
- **Chemicals:** tryptophan (MESH:D014364)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12872517/full.md

---
Source: https://tomesphere.com/paper/PMC12872517