# Hybrid tuned deep learning model for breast cancer diagnosis using genetic data

**Authors:** Farah Hesham, Mohammed M. Abbassy, Mohammed Abdalla

PMC · DOI: 10.1038/s41598-026-41643-8 · 2026-03-21

## TL;DR

This paper introduces a hybrid deep learning model that uses genetic data to accurately predict breast cancer diagnosis and prognosis.

## Contribution

A novel hybrid deep learning model combining CNN and BiLSTM with Bayesian optimization for breast cancer prediction.

## Key findings

- The model achieved 97.4% accuracy (AUC=0.995) on the TCGA dataset.
- Validation on METABRIC showed 99.30% accuracy and 100% recall for predicting cancer-related mortality.
- The model outperforms traditional methods by using high-dimensional genetic and clinical data.

## Abstract

The early diagnosis and prognosis of breast cancer is essential for improving breast cancer survival rates and improving breast cancer clinical outcomes. This study aims to provide breast cancer predictive capabilities through the development and application of a robust hybrid computational prediction methodology that performs testing across multiple whole-genome studies; this research was validated using both TCGA (The Cancer Genome Atlas) and METABRIC (Molecular Taxonomy of Breast Cancer International Consortium). Instead of using traditional methods, where researchers select specific gene sets from the literature, we chose to operate on the highest dimensional input (17,814 genes in TCGA) and the most extensive set of clinical and genomic variables available (503 clinical/genomic features in METABRIC). A multi-stage feature selection process utilizing Random Forest (RF) rankings in conjunction with Association Rule Mining (ARM) was developed to discover important biomarkers. Predictive analysis was performed using a hybrid deep learning model, which contains Convolutional Neural Networks (CNN) in combination with Bidirectional Long Short-Term Memory (BiLSTM) networks, with iterative optimization through the utilization of Bayesian methods. SMOTE and Gaussian noise augmentations were incorporated into the new model to provide additional robustness by addressing class imbalance and minimizing the risk of overfitting (due to the amount of noise present in the training data). The new model outperformed the TCGA-derived model with an accuracy of 97.4% (AUC=0.995), and after validation on the METABRIC dataset, exhibited an even greater accuracy of 99.30% with a 100% recall rate for predicting cancer-related mortality. Through these findings, we have shown that the integration of association-based feature selection with hybrid deep learning architectures has created a tool for breast cancer diagnosis and prognosis that can provide reliable and generalizable results for diverse groups of patients.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** BRCA1 (BRCA1 DNA repair associated) [NCBI Gene 672] {aka BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4}, ATP8B1 (ATPase phospholipid transporting 8B1) [NCBI Gene 5205] {aka ATPIC, BRIC, FIC1, ICP1, PFIC, PFIC1}, BRIP1 (BRCA1 interacting DNA helicase 1) [NCBI Gene 83990] {aka BACH1, FANCJ, OF}, PTEN (phosphatase and tensin homolog) [NCBI Gene 5728] {aka 10q23del, BZS, CWS1, DEC, GLM2, MHAM}, POM121L8P (POM121 transmembrane nucleoporin like 8, pseudogene) [NCBI Gene 29797] {aka DKFZp434K191}, CDH1 (cadherin 1) [NCBI Gene 999] {aka Arc-1, BCDS1, CD324, CDHE, ECAD, LCAM}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, ZNF416 (zinc finger protein 416) [NCBI Gene 55659], MOP-1 [NCBI Gene 643616], E2F7 (E2F transcription factor 7) [NCBI Gene 144455], ATM (ATM serine/threonine kinase) [NCBI Gene 472] {aka AT1, ATA, ATC, ATD, ATDC, ATE}, PCLAF (PCNA clamp associated factor) [NCBI Gene 9768] {aka KIAA0101, L5, NS5ATP9, OEATC, OEATC-1, OEATC1}, Col10a1 (collagen, type X, alpha 1) [NCBI Gene 12813] {aka Col10, Col10a-1}, FUBP1 (far upstream element binding protein 1) [NCBI Gene 8880] {aka FBP, FUBP, hDH V}, CTPS1 (CTP synthase 1) [NCBI Gene 1503] {aka CTPS, GATD5, GATD5A, IMD24}, PALB2 (partner and localizer of BRCA2) [NCBI Gene 79728] {aka BROVCA5, FANCN, PNCA3}, MYC (MYC proto-oncogene, bHLH transcription factor) [NCBI Gene 4609] {aka MRTL, MYCC, bHLHe39, c-Myc}, CENPF (centromere protein F) [NCBI Gene 1063] {aka CENF, CILD31, PRO1779, STROMS, hcp-1}, UBE2T (ubiquitin conjugating enzyme E2 T) [NCBI Gene 29089] {aka FANCT, HSPC150, PIG50}, PCDHGA7 (protocadherin gamma subfamily A, 7) [NCBI Gene 56108] {aka PCDH-GAMMA-A7}, STK11 (serine/threonine kinase 11) [NCBI Gene 6794] {aka LKB1, PJS, hLKB1}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, CHEK2 (checkpoint kinase 2) [NCBI Gene 11200] {aka CDS1, CHK2, HuCds1, LFS2, PP1425, RAD53}, GPRIN1 (G protein regulated inducer of neurite outgrowth 1) [NCBI Gene 114787] {aka GRIN1}
- **Diseases:** ARM (MESH:D018886), lymphatic malignancy (MESH:D008206), cervical cancer (MESH:D002583), BC (MESH:D001943), skin lesion (MESH:D012871), OSCC (MESH:D000077195), Fanconi anaemia (MESH:D000743), obesity (MESH:D009765), Tumor (MESH:D009369), tumorigenesis (MESH:D063646), cancers of the lung and colon (MESH:D008175), deaths (MESH:D003643), metastasis (MESH:D009362)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13009494/full.md

---
Source: https://tomesphere.com/paper/PMC13009494