# Hierarchical learning of gastric cancer molecular subtypes by integrating multi‐modal DNA‐level omics data and clinical stratification

**Authors:** Binyu Yang, Siying Liu, Jiemin Xie, Xi Tang, Pan Guan, Yifan Zhu, Xuemei Liu, Yunhui Xiong, Zuli Yang, Weiyao Li, Yonghua Wang, Wen Chen, Qingjiao Li, Li C. Xia

PMC · DOI: 10.1002/qub2.45 · Quantitative Biology · 2024-05-13

## TL;DR

This paper introduces a new DNA-based method for classifying gastric cancer subtypes, improving accuracy and clinical relevance using multi-modal omics data.

## Contribution

A novel hierarchical classifier (HCG) for gastric cancer subtyping using DNA-level omics data and clinical stratification is developed and validated.

## Key findings

- The HCG classifier achieved high performance with an auROC of 0.95 and accuracy of 0.88.
- 25 subtype-specific DNA alterations were identified, including mutations in SYNE1 and hypermethylation in ALS2CL.
- HCG improved clinical stratification with a significant p-value of 0.032.

## Abstract

Molecular subtyping of gastric cancer (GC) aims to comprehend its genetic landscape. However, the efficacy of current subtyping methods is hampered by their mixed use of molecular features, a lack of strategy optimization, and the limited availability of public GC datasets. There is a pressing need for a precise and easily adoptable subtyping approach for early DNA‐based screening and treatment. Based on TCGA subtypes, we developed a novel DNA‐based hierarchical classifier for gastric cancer molecular subtyping (HCG), which employs gene mutations, copy number aberrations, and methylation patterns as predictors. By incorporating the closely related esophageal adenocarcinomas dataset, we expanded the TCGA GC dataset for the training and testing of HCG (n = 453). The optimization of HCG was achieved through three hierarchical strategies using Lasso‐Logistic regression, evaluated by their overall the area under receiver operating characteristic curve (auROC), accuracy, F1 score, the area under precision‐recall curve (auPRC) and their capability for clinical stratification using multivariate survival analysis. Subtype‐specific DNA alteration biomarkers were discerned through difference tests based on HCG defined subtypes. Our HCG classifier demonstrated superior performance in terms of overall auROC (0.95), accuracy (0.88), F1 score (0.87) and auPRC (0.86), significantly improving the clinical stratification of patients (overall p‐value = 0.032). Difference tests identified 25 subtype‐specific DNA alterations, including a high mutation rate in the SYNE1, ITGB4, and COL22A1 genes for the MSI subtype, and hypermethylation of ALS2CL, KIAA0406, and RPRD1B genes for the EBV subtype. HCG is an accurate and robust classifier for DNA‐based GC molecular subtyping with highly predictive clinical stratification performance. The training and test datasets, along with the analysis programs of HCG, are accessible on the GitHub website (github.com/LabxSCUT).

## Linked entities

- **Genes:** SYNE1 (spectrin repeat containing nuclear envelope protein 1) [NCBI Gene 23345], ITGB4 (integrin subunit beta 4) [NCBI Gene 3691], COL22A1 (collagen type XXII alpha 1 chain) [NCBI Gene 169044], ALS2CL (ALS2 C-terminal like) [NCBI Gene 259173], TTI1 (TELO2 interacting protein 1) [NCBI Gene 9675], RPRD1B (regulation of nuclear pre-mRNA domain containing 1B) [NCBI Gene 58490]
- **Diseases:** gastric cancer (MONDO:0001056)

## Full-text entities

- **Genes:** RPRD1B (regulation of nuclear pre-mRNA domain containing 1B) [NCBI Gene 58490] {aka C20orf77, CREPT, K-H, Kub5-Hera, NET60, dJ1057B20.2}, ALS2CL (ALS2 C-terminal like) [NCBI Gene 259173] {aka RN49018}, ITGB4 (integrin subunit beta 4) [NCBI Gene 3691] {aka CD104, GP150, JEB5A, JEB5B}, TTI1 (TELO2 interacting protein 1) [NCBI Gene 9675] {aka KIAA0406, NEDMIM, smg-10}, COL22A1 (collagen type XXII alpha 1 chain) [NCBI Gene 169044], SYNE1 (spectrin repeat containing nuclear envelope protein 1) [NCBI Gene 23345] {aka 8B, AMC3, AMCM, ARCA1, C6orf98, CPG2}
- **Diseases:** GC (MESH:D013274), esophageal adenocarcinomas (MESH:D000230)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12806395/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12806395/full.md

## References

77 references — full list in the complete paper: https://tomesphere.com/paper/PMC12806395/full.md

---
Source: https://tomesphere.com/paper/PMC12806395