# MASE-GC: a multi-omics autoencoder and stacking ensemble framework for gastric cancer classification

**Authors:** Di Liu, Zhongguang Che, Guannan Xu, Ye Huang

PMC · DOI: 10.3389/fcell.2025.1704237 · 2025-11-12

## TL;DR

MASE-GC is a new framework that uses multi-omics data and machine learning to accurately classify gastric cancer, improving diagnosis and treatment planning.

## Contribution

MASE-GC introduces a novel framework combining multi-omics autoencoders and stacking ensemble learning for gastric cancer classification.

## Key findings

- MASE-GC achieved 0.981 accuracy and 0.9883 F1-score on TCGA-STAD cohort.
- The model showed robust generalizability with over 0.958 accuracy on external validation datasets.
- CNN and Random Forest contributed most to performance gains in the ensemble.

## Abstract

Gastric cancer (GC) is one of the most common malignant tumors and remains a leading cause of cancer-related mortality worldwide. Accurate classification of GC is critical for improving diagnosis, prognosis, and personalized treatment. Recent advances in high-throughput sequencing have enabled the generation of large-scale multi-omics data, offering new opportunities for precise disease stratification. However, existing studies often rely on single-omics approaches or single-model frameworks, which fail to capture the full complexity of tumor biology and suffer from limited sensitivity, specificity, and generalizability.

We propose MASE-GC (Multi-Omics Autoencoder and Stacking Ensemble for Gastric Cancer), a novel computational framework that integrates exon expression, mRNA expression, miRNA expression, and DNA methylation profiles. MASE-GC employs modality-specific autoencoders to extract compact latent features from heterogeneous omics layers and combines them through weighted fusion. The integrated features are then classified using a stacking ensemble of five base learners—Support Vector Machine, Random Forest, Decision Tree, AdaBoost, and Convolutional Neural Network—followed by an XGBoost meta-classifier. A robust preprocessing pipeline, including feature filtering, normalization, and SMOTE–Tomek balancing, is incorporated to address noise, high dimensionality, and class imbalance.

Comprehensive experiments on the TCGA-STAD cohort demonstrated that MASE-GC achieved superior classification performance compared with single-omics and baseline methods, reaching an accuracy of 0.981, precision of 0.9845, recall of 0.992, F1-score of 0.9883, and specificity of 0.824. Ablation studies confirmed the complementary contributions of autoencoders and ensemble components, with CNN and Random Forest providing the largest performance gains. Furthermore, independent validation on external cohorts (GSE62254, GSE15459, GSE84437, and ICGC) confirmed the robustness and generalizability of MASE-GC, with accuracy consistently above 0.958 and F1-scores exceeding 0.969.

MASE-GC advances computational oncology by offering an effective and generalizable framework for GC classification. By integrating multi-omics fusion, ensemble learning, and robust preprocessing, the proposed model improves both sensitivity and specificity, reduces false positives, and demonstrates strong potential for clinical translation in precision diagnostics and treatment planning.

## Linked entities

- **Diseases:** gastric cancer (MONDO:0001056)

## Full-text entities

- **Diseases:** GC (MESH:D013274), cancer (MESH:D009369)

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12647936/full.md

---
Source: https://tomesphere.com/paper/PMC12647936