# Machine learning-based single-sample molecular classifier for cancer grading

**Authors:** Zoia Antysheva, Nikita Kotlov, Mariia V. Guryleva, Ivan Valiev, Viktor Svekolkin, Anna Belozerova, Sheila T. Yong, Dmitry Tabakov, Alexander Bagaev, Vladimir Kushnarev

PMC · DOI: 10.3389/fonc.2025.1617898 · Frontiers in Oncology · 2025-07-16

## TL;DR

This paper introduces a machine learning classifier that uses gene expression data to predict cancer risk grades more reliably than traditional methods.

## Contribution

A novel single-sample molecular classifier for cancer grading that works with RNA-seq or microarray data without cohort scaling.

## Key findings

- mGrades strongly correlate with pathologist-assigned histological grades and clinical stage.
- The classifier effectively assesses risk levels for intermediate-grade (G2) cancer samples.
- Common and unique genetic features were identified across low and high mGrades in multiple cancer types.

## Abstract

Tumor subtyping based on morphological grade is used in cancer treatment and management decision-making and to determine a patient’s prognosis. While low- and high-grade tumors are predictive of patient survival for many cancers, tumors of intermediate morphological grades are considered unreliable due to interobserver variability and thus do not have clear prognostic significance. To address this issue, we devised a molecular-based classifier that uses gene expression data from RNA sequencing (RNA-seq) or microarray profiling to predict high- and low-grade risk groups for breast, lung, and renal cancers. For this classifier, we developed a preprocessing procedure that only required expression data from a single sample, without the need for any batch correction or cohort scaling. This classifier, while trained only on RNA sequencing data, achieves highly accurate risk predictions on both RNA-seq and microarray data. First, the molecular grades (mGrades) predicted by this classifier correlated strongly with the pathologist-assigned histological grades and clinical stage. Next, we showed that mGrades were effective in assessing risk levels for G2 samples. Finally, we identified common and unique biological and genetic features in samples of low and high mGrades across breast, lung, and renal cancers. Gene expression patterns as revealed by the classifier can provide useful information for both research and diagnostic purposes.

## Linked entities

- **Diseases:** cancer (MONDO:0004992), breast cancer (MONDO:0004989), lung cancer (MONDO:0005138), renal cancer (MONDO:0005206)

## Full-text entities

- **Genes:** RYR2 (ryanodine receptor 2) [NCBI Gene 6262] {aka ARVC2, ARVD2, RYR-2, RyR, VACRDS, VTSIP}, Mmp7 (matrix metallopeptidase 7) [NCBI Gene 17393] {aka MAT, MMP-7}, MYC (MYC proto-oncogene, bHLH transcription factor) [NCBI Gene 4609] {aka MRTL, MYCC, bHLHe39, c-Myc}, CCNE1 (cyclin E1) [NCBI Gene 898] {aka CCNE, pCCNE1}, EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}, CDKN2A (cyclin dependent kinase inhibitor 2A) [NCBI Gene 1029] {aka ARF, CAI2, CDK4I, CDKN2, CMM2, INK4}, RB1 (RB transcriptional corepressor 1) [NCBI Gene 5925] {aka OSRC, PPP1R130, RB, p105-Rb, p110-RB1, pRb}, CCNB1 (cyclin B1) [NCBI Gene 891] {aka CCNB}, FN1 (fibronectin 1) [NCBI Gene 2335] {aka CIG, ED-B, FINC, FN, FNZ, GFND}, CHEK1 (checkpoint kinase 1) [NCBI Gene 1111] {aka CHK1, OZEMA21}, CSMD1 (CUB and Sushi multiple domains 1) [NCBI Gene 64478] {aka PPP1R24}, SERPINE1 (serpin family E member 1) [NCBI Gene 5054] {aka PAI, PAI-1, PAI1, PLANH1}, Slc7a5 (solute carrier family 7 (cationic amino acid transporter, y+ system), member 5) [NCBI Gene 20539] {aka 4F2LC, D0H16S474E, Gm42049, LAT1, TA1}, CCND1 (cyclin D1) [NCBI Gene 595] {aka BCL1, D11S287E, PRAD1, U21B31}, KRAS (KRAS proto-oncogene, GTPase) [NCBI Gene 3845] {aka 'C-K-RAS, C-K-RAS, CFC2, K-RAS2A, K-RAS2B, K-RAS4A}, MAP3K1 (mitogen-activated protein kinase kinase kinase 1) [NCBI Gene 4214] {aka MAPKKK1, MEKK, MEKK 1, MEKK1, SRXY6}, BRCA1 (BRCA1 DNA repair associated) [NCBI Gene 672] {aka BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4}, CDK1 (cyclin dependent kinase 1) [NCBI Gene 983] {aka CDC2, CDC28A, P34CDC2}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, PCNA (proliferating cell nuclear antigen) [NCBI Gene 5111] {aka ATLD2}, Birc5 (baculoviral IAP repeat-containing 5) [NCBI Gene 11799] {aka AAC-11, Api4, TIAP, survivin40}, Cthrc1 (collagen triple helix repeat containing 1) [NCBI Gene 68588] {aka 1110014B07Rik}, TERT (telomerase reverse transcriptase) [NCBI Gene 7015] {aka CMM9, DKCA2, DKCB4, EST2, PFBMFT1, TCS1}, Tpx2 (TPX2, microtubule-associated) [NCBI Gene 72119] {aka 2610005B21Rik, DIL2, REPP86, p100}, MAPK1 (mitogen-activated protein kinase 1) [NCBI Gene 5594] {aka ERK, ERK-2, ERK2, ERT1, MAPK2, NS13}, CDH1 (cadherin 1) [NCBI Gene 999] {aka Arc-1, BCDS1, CD324, CDHE, ECAD, LCAM}, MUC16 (mucin 16, cell surface associated) [NCBI Gene 94025] {aka CA125}, PIK3CA (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha) [NCBI Gene 5290] {aka CCM4, CLAPO, CLOVE, CWS5, HMH, MCAP}
- **Diseases:** distant metastasis (MESH:D009362), kidney cancer (MESH:D007680), pancreatic cancer (MESH:D010190), fibrosis (MESH:D005355), cancerous tumors (MESH:D009369), inflammation (MESH:D007249), Amp (MESH:C567878), Clear cell renal cell carcinoma (MESH:D002292), CAP (OMIM:115650), TNBC (MESH:D064726), LUAD (MESH:D000077192), Lung Cancer (MESH:D008175), Luminal B (MESH:D006509), carcinogenesis (MESH:D063646), Hypoxia (MESH:D000860), breast cancer (MESH:D001943)
- **Chemicals:** sunitinib (MESH:D000077210), ROS (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** mG1 — Mus musculus (Mouse), Spontaneously immortalized cell line (CVCL_6824), CPTAC-3 — Mus musculus (Mouse), Hybridoma (CVCL_C6V6), mG3 — Mus musculus (Mouse), Mouse erythroid leukemia, Cancer cell line (CVCL_Y480)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12307393/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12307393/full.md

## References

79 references — full list in the complete paper: https://tomesphere.com/paper/PMC12307393/full.md

---
Source: https://tomesphere.com/paper/PMC12307393