# Development and validation of a highly accurate multigene gene expression biomarker to predict chemotherapy response in primary triple-negative breast cancer

**Authors:** Soukaina Amniouel, Mohsin Saleet Jafri

PMC · DOI: 10.1007/s10549-026-07950-4 · Breast Cancer Research and Treatment · 2026-03-26

## TL;DR

This study developed a multigene biomarker using machine learning to predict chemotherapy response in triple-negative breast cancer patients, aiming to improve treatment outcomes.

## Contribution

A novel multigene biomarker and machine learning models for predicting neoadjuvant chemotherapy response in triple-negative breast cancer.

## Key findings

- 21 overlapping biomarkers were identified, including genes like EPHB3 and VEGFA, linked to TNBC progression and treatment resistance.
- Machine learning models achieved strong predictive performance with AUC values of 91% for random forest and 89% for SVM in the test set.

## Abstract

Triple-negative breast cancer (TNBC) is an aggressive subtype lacking estrogen and progesterone receptors and HER2 amplification. Representing 10–15% of breast cancer cases, TNBC disproportionately affects Black and pre-menopausal women and is associated with poorer outcomes. With chemotherapy as the primary systemic treatment option, achieving a pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) is a key prognostic factor. However, TNBC biological heterogeneity complicates treatment response prediction. This study aimed to identify transcriptomic biomarkers predictive of NAC response in TNBC patients and evaluate machine-learning models for response classification.

We performed transcriptomic profiling on tumors from 234 TNBC patients, divided into training 138 pCR,72 residual disease (RD) and test 9 pCR, 15 RD cohorts. Feature selection was conducted using LASSO regression and Boruta algorithms to identify robust biomarkers. Random forest and support vector machine (SVM) models were trained on the selected and evaluated on the independent test set.

Feature selection identified 21 overlapping biomarkers, including EPHB3, ATP5MJ, USP1, RANBP9, SLC11A2, S100P, PPP1R1A, ZIC1, NDRG2, SMARCA2, H2BC7, STK24, HBB, VPS45, H1, VEGFA, NFIB, ITGA6, RPRD1A, PRKD3, and ENSA, several of which have been implicated in TNBC progression and treatment resistance. In the test set, predictive performance was strong, with area under the curve (AUC) values of 91% for random forest and 89% for SVM.

Transcriptomic profiling combined with machine learning provides a promising approach for predicting NAC response in TNBC. The identified biomarkers may inform precision treatment strategies and improve clinical outcomes in this high-risk patient population.

The online version contains supplementary material available at 10.1007/s10549-026-07950-4.

## Linked entities

- **Genes:** EPHB3 (EPH receptor B3) [NCBI Gene 2049], ATP5MJ (ATP synthase membrane subunit j) [NCBI Gene 9556], USP1 (ubiquitin specific peptidase 1) [NCBI Gene 7398], RANBP9 (RAN binding protein 9) [NCBI Gene 10048], SLC11A2 (solute carrier family 11 member 2) [NCBI Gene 4891], S100P (S100 calcium binding protein P) [NCBI Gene 6286], PPP1R1A (protein phosphatase 1 regulatory inhibitor subunit 1A) [NCBI Gene 5502], ZIC1 (Zic family zinc finger 1) [NCBI Gene 7545], NDRG2 (NDRG family member 2) [NCBI Gene 57447], SMARCA2 (SWI/SNF related BAF chromatin remodeling complex subunit ATPase 2) [NCBI Gene 6595], H2BC7 (H2B clustered histone 7) [NCBI Gene 8343], STK24 (serine/threonine kinase 24) [NCBI Gene 8428], HBB (hemoglobin subunit beta) [NCBI Gene 3043], VPS45 (vacuolar protein sorting 45 homolog) [NCBI Gene 11311], H1-5 (H1.5 linker histone, cluster member) [NCBI Gene 3009], VEGFA (vascular endothelial growth factor A) [NCBI Gene 7422], NFIB (nuclear factor I B) [NCBI Gene 4781], ITGA6 (integrin subunit alpha 6) [NCBI Gene 3655], RPRD1A (regulation of nuclear pre-mRNA domain containing 1A) [NCBI Gene 55197], PRKD3 (protein kinase D3) [NCBI Gene 23683], ENSA (endosulfine alpha) [NCBI Gene 2029]
- **Diseases:** triple-negative breast cancer (MONDO:0005494), breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** BMAL2 (basic helix-loop-helix ARNT like 2) [NCBI Gene 56938] {aka ARNTL2, CLIF, MOP9, PASD9, bHLHe6}, ESR1 (estrogen receptor 1) [NCBI Gene 2099] {aka ER, ESR, ESRA, ESTRR, Era, NR3A1}, NDRG2 (NDRG family member 2) [NCBI Gene 57447] {aka SYLD}, PDK3 (pyruvate dehydrogenase kinase 3) [NCBI Gene 5165] {aka CMTX6, GS1-358P8.4}, HGF (hepatocyte growth factor) [NCBI Gene 3082] {aka DFNB39, F-TCF, HGFB, HPTA, SF}, PER2 (period circadian regulator 2) [NCBI Gene 8864] {aka FASPS, FASPS1}, Akt1 (Akt serine/threonine kinase 1) [NCBI Gene 11651] {aka Akt, LTR-akt, PKB, PKB/Akt, PKBalpha, Rac}, EHF (ETS homologous factor) [NCBI Gene 26298] {aka ESE3, ESE3B, ESEJ}, PGR (progesterone receptor) [NCBI Gene 5241] {aka NR3C3, PR}, MTOR (mechanistic target of rapamycin kinase) [NCBI Gene 2475] {aka FRAP, FRAP1, FRAP2, RAFT1, RAPT1, SKS}, NFE2L2 (NFE2 like bZIP transcription factor 2) [NCBI Gene 4780] {aka IMDDHH, NRF2, Nrf-2}, TFEC (transcription factor EC) [NCBI Gene 22797] {aka TCFEC, TFE-C, TFEC-L, TFECL, bHLHe34, hTFEC-L}, ITGA6 (integrin subunit alpha 6) [NCBI Gene 3655] {aka CD49f, ITGA6A, ITGA6B, JEB6, VLA-6}, EREG (epiregulin) [NCBI Gene 2069] {aka EPR, ER, Ep}, MYC (MYC proto-oncogene, bHLH transcription factor) [NCBI Gene 4609] {aka MRTL, MYCC, bHLHe39, c-Myc}, CDK12 (cyclin dependent kinase 12) [NCBI Gene 51755] {aka CRK7, CRKR, CRKRS}, CRY2 (cryptochrome circadian regulator 2) [NCBI Gene 1408] {aka HCRY2, PHLL2}, IGF1R (insulin like growth factor 1 receptor) [NCBI Gene 3480] {aka CD221, IGFIR, IGFR, JTK13}, VEGFA (vascular endothelial growth factor A) [NCBI Gene 7422] {aka L-VEGF, MVCD1, VEGF, VPF}, Mtor (mechanistic target of rapamycin kinase) [NCBI Gene 56717] {aka 2610315D21Rik, FRAP, FRAP2, Frap1, RAFT1, RAPT1}, CLOCK (clock circadian regulator) [NCBI Gene 9575] {aka KAT13D, bHLHe8}, SLC11A2 (solute carrier family 11 member 2) [NCBI Gene 4891] {aka AHMIO1, DCT1, DMT1, NRAMP2}, S100P (S100 calcium binding protein P) [NCBI Gene 6286] {aka MIG9}, EPHB3 (EPH receptor B3) [NCBI Gene 2049] {aka EK2, ETK2, HEK2, TYRO6}, BMAL1 (basic helix-loop-helix ARNT like 1) [NCBI Gene 406] {aka ARNTL, ARNTL1, BMAL1c, JAP3, MOP3, PASD3}, PRKD3 (protein kinase D3) [NCBI Gene 23683] {aka EPK2, PKC-NU, PKD3, PRKCN, nPKC-NU}, Cdkn1a (cyclin dependent kinase inhibitor 1A) [NCBI Gene 12575] {aka CAP20, CDKI, CIP1, Cdkn1, P21, SDI1}, DEK (DEK proto-oncogene) [NCBI Gene 7913] {aka D6S231E}, STK24 (serine/threonine kinase 24) [NCBI Gene 8428] {aka HEL-S-95, MST3, MST3B, STE20, STK3}, NFIB (nuclear factor I B) [NCBI Gene 4781] {aka CTF, HMGIC/NFIB, MACID, NF-I/B, NF1-B, NFI-B}, TRA2B (transformer 2 beta homolog) [NCBI Gene 6434] {aka Htra2-beta, PPP1R156, RAMELN, SFRS10, SRFS10, TRA2-BETA}, PIK3CB (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit beta) [NCBI Gene 5291] {aka P110BETA, PI3K, PI3KBETA, PIK3C1}, PPP1R1A (protein phosphatase 1 regulatory inhibitor subunit 1A) [NCBI Gene 5502] {aka I1, IPP1}, CETN1 (centrin 1) [NCBI Gene 1068] {aka CEN1, CETN}, RANBP9 (RAN binding protein 9) [NCBI Gene 10048] {aka BPM-L, BPM90, RANBPM, RanBP7}, AKT1 (AKT serine/threonine kinase 1) [NCBI Gene 207] {aka AKT, PKB, PKB-ALPHA, PRKBA, RAC, RAC-ALPHA}, KEAP1 (kelch like ECH associated protein 1) [NCBI Gene 9817] {aka INrf2, KLHL19}, PARP1 (poly(ADP-ribose) polymerase 1) [NCBI Gene 142] {aka ADPRT, ADPRT 1, ADPRT1, ARTD1, PARP, PARP-1}, H1-2 (H1.2 linker histone, cluster member) [NCBI Gene 3006] {aka H1.2, H1C, H1F2, H1s-1, HIST1H1C}, SMAD4 (SMAD family member 4) [NCBI Gene 4089] {aka DPC4, JIP, MADH4, MYHRS}, CDKN2A (cyclin dependent kinase inhibitor 2A) [NCBI Gene 1029] {aka ARF, CAI2, CDK4I, CDKN2, CMM2, INK4}, NRP2 (neuropilin 2) [NCBI Gene 8828] {aka NP2, NPN2, PRO2714, VEGF165R2}, Trp53 (transformation related protein 53) [NCBI Gene 22059] {aka Tp53, bbl, bfy, bhy, p44, p53}, MIR375 (microRNA 375) [NCBI Gene 494324] {aka MIRN375, hsa-mir-375, miRNA375, mir-375}, SMARCA2 (SWI/SNF related BAF chromatin remodeling complex subunit ATPase 2) [NCBI Gene 6595] {aka BAF190, BIS, BRM, NCBRS, SAMRCA2, SNF2}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, NFIL3 (nuclear factor, interleukin 3 regulated) [NCBI Gene 4783] {aka E4BP4, IL3BP1, NF-IL3A, NFIL3A}, HAT1 (histone acetyltransferase 1) [NCBI Gene 8520] {aka KAT1}, PCNA (proliferating cell nuclear antigen) [NCBI Gene 5111] {aka ATLD2}, LRPPRC (leucine rich pentatricopeptide repeat containing) [NCBI Gene 10128] {aka CLONE-23970, GP130, LRP130, LSFC, MC4DN5}, BNC1 (basonuclin zinc finger protein 1) [NCBI Gene 646] {aka BNC, BSN1, HsT19447, POF16, bn1}, PER1 (period circadian regulator 1) [NCBI Gene 5187] {aka PER, RIGUI, hPER}, AGER (advanced glycosylation end-product specific receptor) [NCBI Gene 177] {aka RAGE, SCARJ1, sRAGE}, Birc5 (baculoviral IAP repeat-containing 5) [NCBI Gene 11799] {aka AAC-11, Api4, TIAP, survivin40}, NPAS2 (neuronal PAS domain protein 2) [NCBI Gene 4862] {aka MOP4, PASD4, bHLHe9}, VPS45 (vacuolar protein sorting 45 homolog) [NCBI Gene 11311] {aka H1, H1VPS45, SCN5, VPS45A, VPS45B, VPS54A}, USP1 (ubiquitin specific peptidase 1) [NCBI Gene 7398] {aka UBP}, IL12B (interleukin 12B) [NCBI Gene 3593] {aka CLMF, CLMF2, IL-12B, IMD28, IMD29, NKSF}, NFKB1 (nuclear factor kappa B subunit 1) [NCBI Gene 4790] {aka CVID12, EBP-1, KBF1, NF-kB, NF-kB1, NF-kappa-B1}, ERCC1 (ERCC excision repair 1, endonuclease non-catalytic subunit) [NCBI Gene 2067] {aka COFS4, RAD10, UV20}
- **Diseases:** Tumors (MESH:D009369), pCR (MESH:D005598), RD (MESH:D018365), T-cell leukemia virus 1 (MESH:D015458), amyotrophic lateral sclerosis (MESH:D000690), HR-deficient (MESH:C535296), hyperglycemia (MESH:D006943), breast cancer (MESH:D001943), diabetic cardiomyopathy (MESH:D058065), carcinogenesis (MESH:D063646), drug resistance (MESH:D000069279), TNBC (MESH:D064726), disease (MESH:D004194), metastases (MESH:D009362), mitochondrial dysfunction (MESH:D028361), toxicity (MESH:D064420), alcoholism (MESH:D000437)
- **Chemicals:** ROS (MESH:D017382), taxane (MESH:C080625), melatonin (MESH:D008550), ATP (MESH:D000255), cyclophosphamide (MESH:D003520), Taxol (MESH:D017239), TFAC (-), Anthracycline (MESH:D018943), Fluorouracil (MESH:D005472), docetaxel (MESH:D000077143), Epirubicin (MESH:D015251), platinum (MESH:D010984)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13021716/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13021716/full.md

## References

1 references — full list in the complete paper: https://tomesphere.com/paper/PMC13021716/full.md

---
Source: https://tomesphere.com/paper/PMC13021716