# Decoding temporal heterogeneity in NSCLC through machine learning and prognostic model construction

**Authors:** Junpeng Cheng, Meizhu Xiao, Qingkang Meng, Min Zhang, Denan Zhang, Lei Liu, Qing Jin, Zhijin Fu, Yanjiao Li, Xiujie Chen, Hongbo Xie

PMC · DOI: 10.1186/s12957-024-03435-0 · World Journal of Surgical Oncology · 2024-06-13

## TL;DR

This study uses machine learning to identify key genes and pathways involved in the progression of non-small cell lung cancer, offering potential targets for diagnosis and treatment.

## Contribution

A novel workflow combining cNMF and XGBoost to identify temporal heterogeneous biomarkers in NSCLC and develop a risk score model.

## Key findings

- Malignant NSCLC cells were classified into three functional modules: metabolic reprogramming, cell cycle, and cell stemness.
- Genes like CHCHD2, GAPDH, and CD24 were strongly correlated with NSCLC malignant evolution and validated with histological data.
- A risk score model based on eight genes was validated using GEO data, showing potential for clinical application.

## Abstract

Non-small cell lung cancer (NSCLC) is a prevalent and heterogeneous disease with significant genomic variations between the early and advanced stages. The identification of key genes and pathways driving NSCLC tumor progression is critical for improving the diagnosis and treatment outcomes of this disease.

In this study, we conducted single-cell transcriptome analysis on 93,406 cells from 22 NSCLC patients to characterize malignant NSCLC cancer cells. Utilizing cNMF, we classified these cells into distinct modules, thus identifying the diverse molecular profiles within NSCLC. Through pseudotime analysis, we delineated temporal gene expression changes during NSCLC evolution, thus demonstrating genes associated with disease progression. Using the XGBoost model, we assessed the significance of these genes in the pseudotime trajectory. Our findings were validated by using transcriptome sequencing data from The Cancer Genome Atlas (TCGA), supplemented via LASSO regression to refine the selection of characteristic genes. Subsequently, we established a risk score model based on these genes, thus providing a potential tool for cancer risk assessment and personalized treatment strategies.

We used cNMF to classify malignant NSCLC cells into three functional modules, including the metabolic reprogramming module, cell cycle module, and cell stemness module, which can be used for the functional classification of malignant tumor cells in NSCLC. These findings also indicate that metabolism, the cell cycle, and tumor stemness play important driving roles in the malignant evolution of NSCLC. We integrated cNMF and XGBoost to select marker genes that are indicative of both early and advanced NSCLC stages. The expression of genes such as CHCHD2, GAPDH, and CD24 was strongly correlated with the malignant evolution of NSCLC at the single-cell data level. These genes have been validated via histological data. The risk score model that we established (represented by eight genes) was ultimately validated with GEO data.

In summary, our study contributes to the identification of temporal heterogeneous biomarkers in NSCLC, thus offering insights into disease progression mechanisms and potential therapeutic targets. The developed workflow demonstrates promise for future applications in clinical practice.

The online version contains supplementary material available at 10.1186/s12957-024-03435-0.

## Linked entities

- **Genes:** CHCHD2 (coiled-coil-helix-coiled-coil-helix domain containing 2) [NCBI Gene 51142], GAPDH (glyceraldehyde-3-phosphate dehydrogenase) [NCBI Gene 2597], CD24 (CD24 molecule) [NCBI Gene 100133941]
- **Diseases:** non-small cell lung cancer (MONDO:0005233), NSCLC (MONDO:0005233)

## Full-text entities

- **Genes:** CHCHD2 (coiled-coil-helix-coiled-coil-helix domain containing 2) [NCBI Gene 51142] {aka C7orf17, MIX17B, MNRR1, NS2TP, PARK22}, CD24 (CD24 molecule) [NCBI Gene 100133941] {aka CD24A}, GAPDH (glyceraldehyde-3-phosphate dehydrogenase) [NCBI Gene 2597] {aka G3PD, GAPD, HEL-S-162eP}
- **Diseases:** Cancer (MESH:D009369), NSCLC (MESH:D002289)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11170806/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11170806/full.md

## References

71 references — full list in the complete paper: https://tomesphere.com/paper/PMC11170806/full.md

---
Source: https://tomesphere.com/paper/PMC11170806