# DeepCMS: A Feature Selection-Driven Model for Cancer Molecular Subtyping with a Case Study on Testicular Germ Cell Tumors

**Authors:** Mehwish Wahid Khan, Ghufran Ahmed, Muhammad Shahzad, Abdallah Namoun, Shahid Hussain, Meshari Huwaytim Alanazi

PMC · DOI: 10.3390/diagnostics15212730 · Diagnostics · 2025-10-28

## TL;DR

DeepCMS is a deep learning framework that improves cancer molecular subtyping by selecting key features from gene expression data, leading to accurate and robust classification.

## Contribution

DeepCMS introduces a novel framework combining feature selection, enrichment analysis, and deep learning for cancer subtyping.

## Key findings

- DeepCMS outperformed existing models in accuracy, sensitivity, and specificity for cancer subtyping.
- The framework achieved over 0.90 aggregated metrics on independent test datasets, showing strong generalizability.
- A case study demonstrated its applicability beyond colon cancer to other gene expression datasets.

## Abstract

Background/Objectives: Cancer is a chronic and heterogeneous disease, possessing molecular variation within a single type, resulting in its molecular subtypes. Cancer molecular subtyping offers biological insights into cancer variability, facilitating the development of personalized medicines. Various models have been proposed for cancer molecular subtyping, utilizing the high-dimensional transcriptomic, genomic, or proteomic data. The issue of data scarcity, characterized by high feature dimensionality and a limited sample size, remains a persistent problem.The objective of this research is to propose a deep learning framework, DeepCMS, that leverages the capabilities of feed-forward neural networks, gene set enrichment analysis, and feature selection to construct a well-representative subset of the feature space, thereby producing promising results. Methods: The gene expression data were transformed into enrichment scores, resulting in over 22,000 features. From those, the top 2000 features were selected, and deep learning was applied to these features. The encouraging outcomes indicate the efficacy of the proposed framework in terms of defining a well-representative feature space and accurately classifying cancer molecular subtypes. Results: DeepCMS consistently outperformed state-of-the-art models in aggregated accuracy, sensitivity, specificity, and balanced accuracy. The aggregated metrics surpassed 0.90 for all efficiency measures on independent test datasets, showing the generalizability and robustness of our framework. Although developed using colon cancer’s gene expression data, this approach may be applied to any gene expression data; a case study is also devised for illustration. Conclusions: Overall, the proposed DeepCMS framework enables the accurate and robust classification of cancer molecular subtypes using a compact and informative feature set, facilitating improved precision in oncology applications.

## Linked entities

- **Diseases:** cancer (MONDO:0004992), colon cancer (MONDO:0002032)

## Full-text entities

- **Diseases:** colon cancer (MESH:D015179), Cancer (MESH:D009369), Germ Cell Tumors (MESH:D009373)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12607777/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12607777/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12607777/full.md

---
Source: https://tomesphere.com/paper/PMC12607777