# Machine Learning Reveals the Association Between Gene Expression and Immune Infiltration in Colorectal Cancer: A Comprehensive Study From Single‐Cell to Survival Analysis

**Authors:** Xiaoxin Duan, Shen Huang, Yan Zhou, Tao Yang, Jiaqi Liu, Hudan Song, Pingliang Sun

PMC · DOI: 10.1111/jcmm.71049 · Journal of Cellular and Molecular Medicine · 2026-03-02

## TL;DR

This study uses machine learning on single-cell data to uncover gene expression patterns linked to immune cell infiltration in colorectal cancer, identifying new biomarkers and subtypes for better treatment strategies.

## Contribution

A novel computational framework integrating machine learning methods to analyze single-cell RNA sequencing data in CRC, revealing new molecular subtypes and immune-related biomarkers.

## Key findings

- Two novel molecular subtypes with different outcomes were identified using unsupervised clustering (p = 0.049).
- CD19, MAP2, CALB2, and TGFB2 were identified as key biomarkers involved in immune modulation.
- New biological pathways related to immune response were revealed through gene enrichment analysis of subgroups.

## Abstract

Colorectal cancer (CRC) is one of the most common causes of cancer mortality globally. Analysis of immune cell infiltration patterns in the tumour microenvironment (TME) is critical to treatment outcomes, but the molecular mechanisms which regulate this process are still poorly understood. We uniquely applied machine learning to single‐cell RNA sequencing analysis to unravel the complex interaction between gene expression profiles and immune cell infiltration in CRC. We present a new computational framework that integrates different machine learning methods to analyse single‐cell RNA sequencing data from CRC patients. The system leverages unsupervised clustering, survival, and gene‐set enrichment analyses to pinpoint principal molecular signatures. CIBERSORT & ESTIMATE were used for immune cell quantification, whereas UMAP and t‐SNE were used for high‐dimensional data visualisation and pattern discovery. Our analyses uncovered gene expression signatures that closely associated with immune cell infiltration patterns in CRC. Using unsupervised clustering, we discovered two novel molecular subtypes that displayed markedly different outcomes (p = 0.049). We identified CD19, MAP2, CALB2 and TGFB2 as key biomarkers involved in immune modulation. However, gene enrichment analysis of these subgroups revealed new biological pathways involving the immune response. Our proposed models showed strong predictive capabilities verified by ROC curve analysis. Using single‐cell analysis to identify previously uncharacterized interactions between specific immune cell populations and tumour cells, thereby uncovering novel immune evasion mechanisms and potential immunotherapy targets within the TME. Our results uncover novel candidate biomarkers for response to immunotherapy prediction and highlight molecular profiles that could support guided treatment approaches. The predictive models derived at present have the potential to be implemented in clinical practice for decision‐making in CRC management.

## Linked entities

- **Genes:** CD19 (CD19 molecule) [NCBI Gene 930], MAP2 (microtubule associated protein 2) [NCBI Gene 4133], CALB2 (calbindin 2) [NCBI Gene 794], TGFB2 (transforming growth factor beta 2) [NCBI Gene 7042]
- **Diseases:** colorectal cancer (MONDO:0005575)

## Full-text entities

- **Genes:** HAVCR2 (hepatitis A virus cellular receptor 2) [NCBI Gene 84868] {aka CD366, HAVcr-2, KIM-3, SPTCL, TIM3, TIMD-3}, PTPRC (protein tyrosine phosphatase receptor type C) [NCBI Gene 5788] {aka B220, CD45, CD45R, GP180, IMD105, L-CA}, CD44 (CD44 molecule (IN blood group)) [NCBI Gene 960] {aka CDW44, CSPG8, ECM-III, ECMR-III, H-CAM, HCELL}, FPR1 (formyl peptide receptor 1) [NCBI Gene 2357] {aka FMLP, FPR}, LGALS9 (galectin 9) [NCBI Gene 3965] {aka HUAT, LGALS9A}, ANXA1 (annexin A1) [NCBI Gene 301] {aka ANX1, LPC1}, CALB2 (calbindin 2) [NCBI Gene 794] {aka CAB29, CAL2, CR}, TGFB2 (transforming growth factor beta 2) [NCBI Gene 7042] {aka CAEND2, G-TSF, LDS4, TGF-beta2}, CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}, TGFB1 (transforming growth factor beta 1) [NCBI Gene 7040] {aka CAEND1, CED, DPD1, IBDIMDE, LAP, TGF-beta1}, TGFBI (transforming growth factor beta induced) [NCBI Gene 7045] {aka BIGH3, CDB1, CDG2, CDGG1, CSD, CSD1}, MIF (macrophage migration inhibitory factor) [NCBI Gene 4282] {aka GIF, GLIF, MMIF}, MAP2 (microtubule associated protein 2) [NCBI Gene 4133] {aka MAP-2, MAP2A, MAP2B, MAP2C}, CXCR4 (C-X-C motif chemokine receptor 4) [NCBI Gene 7852] {aka CD184, D2S201E, FB22, HM89, HSY3RR, LCR1}, CD74 (CD74 molecule) [NCBI Gene 972] {aka CLIP, DHLAG, HLADG, II, Ia-GAMMA, p33}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}, CD19 (CD19 molecule) [NCBI Gene 930] {aka B4, CVID3}
- **Diseases:** Cancer (MESH:D009369), inflammation (MESH:D007249), tumorigenesis (MESH:D063646), CRC (MESH:D015179), metastasis (MESH:D009362)
- **Chemicals:** arachidonic acid (MESH:D016718)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12953194/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12953194/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12953194/full.md

---
Source: https://tomesphere.com/paper/PMC12953194