# DeepGene-BC: Deep Learning-Based Breast Cancer Subtype Prediction via Somatic Point Mutation Profiles

**Authors:** Pengfei Hou, Liangjie Liu, Yijia Duan, Shanshan Yin, Wenqian Yan, Chongchen Pang, Yang Yan, Sabreena Aziz, Mika Torhola, Henna Kujanen, Klaus Förger, Hui Shi, Guang He, Yi Shi

PMC · DOI: 10.3390/cancers18040570 · Cancers · 2026-02-09

## TL;DR

This paper introduces deepGene-BC, a deep learning method that uses DNA mutations to predict breast cancer subtypes, offering a less invasive alternative to traditional methods.

## Contribution

The novel contribution is a deep learning framework that combines pathway-informed feature selection with a hybrid neural network for sparse mutation data.

## Key findings

- deepGene-BC achieved 77.3% accuracy in predicting breast cancer subtypes using mutation data from the TCGA cohort.
- The model demonstrated strong discriminative performance with a macro-averaged AU-ROC of 0.94.
- Mutation patterns were shown to recapitulate established transcriptome-defined subtypes.

## Abstract

Breast cancer subtypes are critical for treatment selection and disease monitoring, but current classification methods rely on invasive tumor biopsies and transcriptomic assays that may not be suitable for repeated sampling. Somatic DNA mutations provide stable molecular markers that can be detected from circulating cell-free DNA, offering opportunities for minimally invasive tumor profiling. In this study, we explore the feasibility of using mutation profiles to infer breast cancer molecular subtypes. We propose deepGene-BC, a computational framework that extracts subtype-associated signals from sparse somatic mutation data. Using tissue-derived sequencing data as a proof of concept, we demonstrate that mutation patterns can recapitulate established transcriptome-defined subtypes. This work establishes a foundation for future development of mutation-based liquid biopsy approaches for longitudinal disease monitoring and precision oncology.

Background: Molecular subtyping of breast cancer usually relies on transcriptomic profiles, a method constrained by limitations in robustness and clinical applicability. While somatic point mutations represent a stable genomic alternative, their predictive utility is hindered by high dimensionality, extreme sparsity, and weak single-gene associations. Methods: Here, we present deepGene-BC, a deep learning framework that synergizes a pathway-informed feature selection strategy with a hybrid neural network tailored for sparse binary data. To distill sparse genome-wide mutations into a compact and interpretable feature set, deepGene-BC integrates mutation recurrence filtering, curated pathway priors, and mutual information-based gene prioritization. These refined features are subsequently modeled using a specialized hybrid architecture designed to capture complex linear effects, feature interactions, and higher-order nonlinear patterns. Results: When benchmarked against an independent test set (n = 273) from the TCGA breast cancer cohort, deepGene-BC achieved an overall accuracy of 77.3% and an average sensitivity of 75.2%, accompanied by a strong overall discriminative performance (macro-averaged AU-ROC = 0.94, 95% CI: 0.92–0.96). Conclusions: By effectively combining biologically informed feature engineering with deep learning, deepGene-BC holds significant promise for non-invasive molecular stratification and precision oncology.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, GATA3 (GATA binding protein 3) [NCBI Gene 2625] {aka HDR, HDRS}, AKT1 (AKT serine/threonine kinase 1) [NCBI Gene 207] {aka AKT, PKB, PKB-ALPHA, PRKBA, RAC, RAC-ALPHA}, BRCA1 (BRCA1 DNA repair associated) [NCBI Gene 672] {aka BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4}, PIK3CA (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha) [NCBI Gene 5290] {aka CCM4, CLAPO, CLOVE, CWS5, HMH, MCAP}, PIK3CB (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit beta) [NCBI Gene 5291] {aka P110BETA, PI3K, PI3KBETA, PIK3C1}, TMEM43 (transmembrane protein 43) [NCBI Gene 79188] {aka ARVC5, ARVD5, AUNA3, EDMD7, EDMD7; AUNA2, LUMA}
- **Diseases:** Luminal B (MESH:D006509), Basal-like tumors (MESH:D009369), injury to (MESH:D014947), Breast Cancer (MESH:D001943)
- **Chemicals:** paraffin (MESH:D010232), formalin (MESH:D005557), deepGene (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12939208/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12939208/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12939208/full.md

---
Source: https://tomesphere.com/paper/PMC12939208