# Gene driven analytical learning model for accurate breast cancer diagnosis

**Authors:** Farah Hesham, Mohammed M. Abbassy, Mohammed Abdalla

PMC · DOI: 10.1038/s41598-026-39430-6 · 2026-03-03

## TL;DR

This paper introduces a deep learning model combining CNN and BiLSTM to improve breast cancer diagnosis accuracy using gene expression data.

## Contribution

A novel hybrid CNN-BiLSTM model with a 236-gene set derived via correlation analysis for precise breast cancer diagnosis.

## Key findings

- The hybrid CNN-BiLSTM model achieved a Recall of 0.9943, significantly higher than other models.
- The model demonstrated an ROC AUC of 0.9955 and an F1 score of 0.9962.
- The framework showed robustness with minimal variance under 20% noise perturbation.

## Abstract

Patients diagnosed with breast cancer exhibit a diverse range of prognostic outcomes due to the varied nature of the disease across different patient groups. To address this complexity and enhance prognostic predictions based on gene expression data from breast cancer samples, this study has developed an integrated deep learning method that combines Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory (BiLSTM) networks. This automated pipeline conducts a correlation analysis using Pearson correlation to derive a reliable 236-gene set, ensuring no data contamination from patient samples.Furthermore, patterns of gene–gene interactions based on correlations were examined to provide further evidence of the biological relevance of the gene set that was selected. The training and validation of the proposed model utilized data from The Cancer Genome Atlas-Breast Cancer (TCGA-BRCA) and was assessed using the METABRIC dataset to enhance generalization capabilities. Experimental results indicate that the Full Hybrid (CNN BiLSTM) model significantly outperforms other machine learning and deep learning approaches. Notably, while the BiLSTM-only model achieved an optimal Recall of 0.9319, the hybrid model demonstrated a substantially higher Recall of 0.9943, accompanied by an impressive ROC AUC of 0.9955 and an F1 score of 0.9962. Furthermore, the proposed framework has been statistically validated, achieving a minimal variance of 0.000083 even under conditions of up to 20% noise perturbation. Optimization of this framework was conducted using the Optuna Bayesian Optimization methodology on a dual NVIDIA Tesla T4 array configuration. Overall, this article presents a universal computational tool for precision medicine in breast cancer, designed to yield consistent results across diverse patient scenarios.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** SLTM (SAFB like transcription modulator) [NCBI Gene 79811] {aka Met}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, BRCA2 (BRCA2 DNA repair associated) [NCBI Gene 675] {aka BRCC2, BROVCA2, FACD, FAD, FAD1, FANCD}, G6PD (glucose-6-phosphate dehydrogenase) [NCBI Gene 2539] {aka CNSHA1, G6PD1}, PTEN (phosphatase and tensin homolog) [NCBI Gene 5728] {aka 10q23del, BZS, CWS1, DEC, GLM2, MHAM}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, BRCA1 (BRCA1 DNA repair associated) [NCBI Gene 672] {aka BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4}, DUSP4 (dual specificity phosphatase 4) [NCBI Gene 1846] {aka HVH2, MKP-2, MKP2, TYP}, ESR1 (estrogen receptor 1) [NCBI Gene 2099] {aka ER, ESR, ESRA, ESTRR, Era, NR3A1}, ESR2 (estrogen receptor 2) [NCBI Gene 2100] {aka ER-BETA, ESR-BETA, ESRB, ESTRB, Erb, NR3A2}, MAP2K1 (mitogen-activated protein kinase kinase 1) [NCBI Gene 5604] {aka CFC3, MAPKK1, MEK1, MEL, MKK1, PRKMK1}, CDK19 (cyclin dependent kinase 19) [NCBI Gene 23097] {aka CDC2L6, CDK11, DEE87, EIEE87, bA346C16.3}, IMMT (inner membrane mitochondrial protein) [NCBI Gene 10989] {aka HMP, MICOS60, MINOS2, Mic60, P87, P87/89}
- **Diseases:** BRCA cancer (MESH:D001943), lung metastases (MESH:D009362), deaths (MESH:D003643), lung cancer (MESH:D008175), Cancer (MESH:D009369)
- **Chemicals:** ROS (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12960951/full.md

---
Source: https://tomesphere.com/paper/PMC12960951