# A serum metabolite-based machine learning model predicts response to neoadjuvant immunotherapy in mismatch repair-deficient colorectal cancer

**Authors:** Tao Ma, Weili Zhang, Yuxi Pan, Guojie Long, Xiuwei Mi, Junfeng Jiang, Fan Bai, Hao Zhang, Tuo Hu, Ziyang Zeng, Weidong Pan

PMC · DOI: 10.3389/fonc.2026.1730155 · Frontiers in Oncology · 2026-02-19

## TL;DR

A blood test using machine learning can predict which mismatch repair-deficient colorectal cancer patients will respond to immunotherapy.

## Contribution

Development of a non-invasive serum metabolite-based machine learning model to predict immunotherapy response in dMMR colorectal cancer.

## Key findings

- A 5-metabolite predictive model (5-MPM) achieved AUC values of 0.85 in training and 0.88 in external validation.
- The model includes PGE2, tryptophan, arginine, citrulline, and histidine as key metabolites.
- SHAP analysis revealed individual metabolite contributions to model predictions.

## Abstract

Colorectal cancer (CRC) with microsatellite instability-high (MSI-H) or mismatch repair-deficient (dMMR) shows significant sensitivity to immune checkpoint inhibitors (ICIs). However, a considerable proportion of patients still exhibit primary or acquired resistance to ICIs. Until now, efficient and non-invasive biomarkers for accurately predicting immunotherapy efficacy remain unavailable.

In this multicenter study, we employed liquid chromatography–mass spectrometry (LC–MS) and enzyme-linked immunosorbent assay (ELISA) to identify and validate serum metabolites associated with response to immunotherapy. Using machine learning algorithms, we constructed a random forest predictive model based on a panel of five metabolites. This model, termed the 5-Metabolite Predictive Model (5-MPM), incorporates prostaglandin E2 (PGE2), tryptophan, arginine, citrulline, and histidine.

The 5-MPM model demonstrated robust predictive performance in both training cohort and external validation cohort, with AUC values of 0.85 and 0.88, respectively. The SHAP analysis elucidated the contribution of each metabolite to model predictions. Integrating above five metabolites with metastasis stage did not further improve the predictive performance of this model.

This study provides the first systematic characterization of metabolic reprogramming in dMMR colorectal cancer with different response to immunotherapy, and establishes a non-invasive, high-precision predictive tool that offers a new basis for individualized therapeutic decision-making.

## Linked entities

- **Chemicals:** PGE2 (PubChem CID 5280360), tryptophan (PubChem CID 1148), arginine (PubChem CID 232), citrulline (PubChem CID 833), histidine (PubChem CID 773)
- **Diseases:** colorectal cancer (MONDO:0005575)

## Full-text entities

- **Genes:** ASS1 (argininosuccinate synthase 1) [NCBI Gene 445] {aka ASS, CTLN1}, CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}, IL2RB (interleukin 2 receptor subunit beta) [NCBI Gene 3560] {aka CD122, IL15RB, IMD63, P70-75}, PTGES (prostaglandin E synthase) [NCBI Gene 9536] {aka MGST-IV, MGST1-L1, MGST1L1, MPGES, PGES, PIG12}, TCF7 (transcription factor 7) [NCBI Gene 6932] {aka TCF-1}, TNF (tumor necrosis factor) [NCBI Gene 7124] {aka DIF, IMD127, TNF-alpha, TNFA, TNFSF2, TNLG1F}, SPATA2 (spermatogenesis associated 2) [NCBI Gene 9825] {aka PD1, PPP1R145, tamo}, STAT1 (signal transducer and activator of transcription 1) [NCBI Gene 6772] {aka CANDF7, IMD31A, IMD31B, IMD31C, ISGF-3, STAT91}, IFNG (interferon gamma) [NCBI Gene 3458] {aka IFG, IFI, IMD69}, JAK2 (Janus kinase 2) [NCBI Gene 3717] {aka JTK10}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, IL2 (interleukin 2) [NCBI Gene 3558] {aka IL-2, TCGF, lymphokine}, CD274 (CD274 molecule) [NCBI Gene 29126] {aka ADMIO5, B7-H, B7H1, PD-L1, PDCD1L1, PDCD1LG1}
- **Diseases:** mismatch (MESH:C536928), MSI-H (MESH:D053842), Cancer (MESH:D009369), deficient (MESH:D007153), SAH (MESH:D013345), CRC (MESH:D015179), inflammatory (MESH:D007249), metastasis (MESH:D009362), non-small cell lung cancer (MESH:D002289), H (MESH:D000848), triple-negative breast cancer (MESH:D064726)
- **Chemicals:** phenylalanine (MESH:D010649), TCA (MESH:D014238), Arginine (MESH:D001120), lactate (MESH:D019344), amino acid (MESH:D000596), isoleucine (MESH:D007532), Histidine (MESH:D006639), eicosanoid (MESH:D015777), 5-MPM (-), histamine (MESH:D006632), nitric oxide (MESH:D009569), SAH (MESH:D012435), threonine (MESH:D013912), sarcosine (MESH:D012521), MPM (MESH:C083299), kynurenine (MESH:D007737), Tryptophan (MESH:D014364), Citrulline (MESH:D002956), arachidonic acid (MESH:D016718), PGE2 (MESH:D015232)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12960109/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12960109/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12960109/full.md

---
Source: https://tomesphere.com/paper/PMC12960109