Paired-Box (PAX) Gene Signatures as a Biomarker of Breast Cancer Progression
Manuel Scimeca, Maria Paola Scioli, Valeria Palumbo, Lukas Funke, Jonathan Woodsmith, Francesca Servadei, Erica Giacobbi, Christian Seghetti, Oreste Claudio Buonomo, Eleonora Candi, Michele Treglia, Luigi Tonino Marsella, Gerry Melino, Alessandro Mauriello, Rita Bonfiglio

TL;DR
This paper explores how specific PAX genes are linked to breast cancer progression and could serve as biomarkers for predicting metastasis and treatment response.
Contribution
The study identifies novel PAX gene signatures associated with breast cancer hallmarks and metastasis prediction.
Findings
PAX1 and PAX9 correlate with both cell death and proliferation markers.
PAX3, PAX5, and PAX8 are linked to immune checkpoint expression like PD-L1 and TIGIT.
PAX2–PAX7 signatures accurately predict lymph node metastasis at the transcriptomic level.
Abstract
Breast cancer is the leading cause of cancer-related death in women, and despite advances in preventive screening as well as in molecular classification, many patients still do not benefit from existing therapies, highlighting the importance of identifying new molecular determinants of treatment resistance. The Paired-box (PAX) family of developmental transcription factors are central regulators of tissue morphogenesis and lineage specification, yet their reactivation in tumors and contribution to breast cancer progression remain only partially defined. Here, a multi-level analysis integrating RNA sequencing and protein profiling in twenty-one primary breast carcinomas shows that distinct PAX members are directly correlated to distinct fundamental cancer hallmarks, including proliferation, cell death, epithelial–mesenchymal transition, immune evasion, and genomic instability.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —Ministry of Health—HUB LIFE SCIENCE—Advanced Diagnostic–Italian network of excellence for advanced diagnosis
- —European Union NextGenerationEU
- —Associazione Italiana per la Ricerca contro il Cancro (AIRC)
- —RC/Fondazione Luigi Maria Monti
- —“Ministero dell’Università e della Ricerca” PRIN2022
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDevelopmental Biology and Gene Regulation · Cancer Cells and Metastasis · FOXO transcription factor regulation
1. Introduction
Breast cancer is the most commonly diagnosed malignancy and the leading cause of cancer-related death among women worldwide [1,2]. Despite significant advances in early detection and the development of molecular targeted therapies, including endocrine modulators, anti-HER2 agents, and CDK4/6 inhibitors [3,4,5], a substantial proportion of patients suffer from disease recurrence, distant metastases, or resistance to standard treatments [6]. This heterogeneity of clinical responses underscores a persistent gap in our understanding of the molecular determinants of tumor progression and therapeutic response [7,8,9,10]. As a result, there is an urgent need to identify novel and reliable biomarkers that can enhance prognostic accuracy and guide individualized treatment decisions beyond current molecular classifications [11,12].
In recent years, developmental transcription factors have emerged as central players in oncogenesis, driving processes such as cellular plasticity, immune modulation, and metastatic spread.
The Paired-box (PAX) gene family, comprising nine evolutionarily conserved genes critical for embryonic development and tissue morphogenesis, has attracted growing attention in cancer biology [13,14,15,16]. In physiological contexts, PAX genes regulate lineage commitment, stem cell maintenance, and epithelial–mesenchymal dynamics. However, their ectopic reactivation in adult tissues, particularly within tumors, suggests a potential role in reprogramming differentiation pathways and promoting malignant phenotypes [13,14,15,16]. Although various studies have implicated specific PAX genes as context-dependent oncogenes or tumor suppressors [17,18], current evidence in human breast cancer remains limited. PAX5, traditionally known for its role in B-cell lineage commitment, has been implicated in regulating epithelial-to-mesenchymal transition (EMT) and tumor invasiveness through epigenetic regulation and miRNA-mediated networks [19,20]. Its downregulation, often through promoter hypermethylation, has been associated with aggressive tumor phenotypes and loss of epithelial markers such as E-cadherin [19,20]. Conversely, PAX6 displays a tumor suppressor or oncogene depending on the molecular context [21]. However, most existing data about the role of PAX genes in breast cancer progression are fragmented, relying on single-gene approaches or transcriptomic meta-analyses, often without direct validation at the protein level. Moreover, the functional implications of PAX expression in the tumor immune landscape, epithelial plasticity, and clinical behaviour remain poorly defined. To address this gap, we performed a multi-omic investigation of PAX gene expression across transcriptomic and protein levels in twenty-one primary breast carcinomas. By correlating PAX signatures with distinct cancer hallmarks, we provide evidence for a potential role for specific PAX members in modulating the tumor microenvironment and metastatic dissemination. We also uncover a transcriptional and immunohistochemical PAX-based signature with high predictive accuracy for lymph node metastasis, underscoring their potential as clinically reliable biomarkers.
2. Results
2.1. Histopathological Assessment
Twenty-one invasive breast cancer cases from female patients aged 41 to 87 years (mean ± SD: 58.7 ± 3.0 years) have been collected at the University of Rome Tor Vergata. According to Nottingham grading, invasive ductal carcinoma has been classified as G3 (15/21), G2 (4/21) and G1 (1/21). One case was classified as mixed invasive carcinoma with lobular and ductal mixed pattern. Molecular classification by the PAM-50 multigene test allowed us to identify the following subtypes: 4/21 Luminal-A, 11/21 Luminal-B, 2/21 Basal-like, 1/21 Normal-like and 3/21 Unclear.
2.2. Systematic PAX Expression and Prognostic Association Analysis
To investigate the potential role of PAX family members in breast cancer (Figure 1), a bioinformatic analysis was performed using the UCSC Xena tool based on TCGA Breast Cancer (BRCA) cohort data [22]. The data showed that PAX3, PAX6 and PAX8 expressions are associated with overall survival (OS) in breast cancer patients. Specifically, higher expression of PAX3 (Figure 1C) and PAX6 is negatively associated with OS (Figure 1F), whereas patients with high levels of PAX8 expression exhibit improved OS. No difference in OS was observed for the expression of PAX1, PAX2, PAX4, PAX5, PAX7 and PAX9.
Analysis of PAX family member expression in normal, tumoral, and metastatic breast tissues revealed a significant increase in PAX1 (Figure 1A), PAX2 (Figure 1B), and PAX9 (Figure 1I) in tumor samples compared to normal ones. Noteworthy, PAX5 (Figure 1E) expression was higher in metastatic tissues as compared to both normal and tumoral samples.
2.3. PAXs Gene Expression and Key Molecular Features of Breast Cancers: Cell Proliferation and Cell Death
To investigate the potential role of PAX genes in breast cancer, RNA-seq data from all 21 tumour samples were analysed and compared with key molecular features of tumour lesions, including genes involved in immune evasion, cell death pathways, EMT, hypoxia, and the tumour microenvironment. Thus, we aimed to validate the observed associations in an independent cohort and to explore potential links between PAX gene expression and major biological processes underlying tumour progression and therapeutic resistance.
As a results, PAX1 and PAX9 displayed significant positive correlations with key markers related to cell death such as BARD1 (PAX1: ρ = 0.66, p = 0.002; PAX9: ρ = 0.66, p = 0.001) and BAX (PAX1: ρ = 0.58, p = 0.008; PAX9: ρ = 0.41, p = 0.064). These associations suggest that PAX proteins may actively influence cell fate decisions, balancing apoptotic and survival pathways within the tumor microenvironment [23]. In addition, BAX expression correlated positively with PAX5, further implicating this transcription factor in the modulation of pro-apoptotic signalling (Figure 2). Notably, PAX9 expression showed a significant positive association with both hypoxia pan cancer (ρ = 0.55, p = 0.01) and proliferation scores (ρ = 0.54, p = 0.012) (Figure 2), highlighting its dual involvement in sustaining tumor growth under stress conditions (Figure 2 and Figure 3).
2.4. PAXs Expression and Immune Evasion
Several PAX family members showed significant correlations with key immune checkpoint molecules (Figure 3). Specifically, PAX3, PAX5, and PAX8 exhibited positive associations with PD-L1 expression (PAX3: ρ = 0.50, p = 0.02; PAX5: ρ = 0.60, p = 0.005; PAX8: ρ = 0.45, p = 0.04). Notably, the association between PD-L1 and PAX5 has also been observed in a large cohort of breast infiltrating carcinomas (n = 1091) from KM database (see Supplementary Figure S1A). Similarly, significant correlations were observed with TIGIT for PAX3 (ρ = 0.48, p = 0.03) and PAX5 (ρ = 0.53, p = 0.01). These findings suggest that PAX gene expression may contribute to immune evasion through upregulation of immune checkpoint pathways. In particular, PAX5 displayed the most extensive pattern of immune associations. Positive correlations were identified with B cell scores (ρ = 0.52, p = 0.02), cytotoxic lymphocyte infiltration (ρ = 0.54, p = 0.01), and NK cell markers (ρ = 0.48, p = 0.03).
2.5. PAXs Expression and Epithelial–Mesenchymal Transition and Functional Clustering
In the context of EMT, PAX genes have been associated with the expression of several genes involved in the EMT of breast carcinomas such as ZEB1 [24], FN1 [25,26,27], MMPs [28] and CDH1 [29]. PAX1 displayed strong positive correlations with several canonical EMT mediators, including FN1 (ρ = 0.66, p = 0.002), ZEB1 (ρ = 0.57, p = 0.009), and MMP2 (ρ = 0.58, p = 0.008), as well as with the pan-cancer EMT signature (Figure 4). The associations between PAX1 and FN1, ZEB1 and MMP2 were also validated in a large cohort of breast infiltrating carcinomas (n = 1091) from KM database (see Supplementary Figure S1B). These associations suggest that PAX1 expression accompanies a mesenchymal transcriptional program characterized by extracellular matrix remodelling and enhanced cellular motility. Conversely, PAX2 exhibited a distinct correlation pattern, showing a robust positive association with CDH1 (ρ = 0.67, p = 0.001), a key epithelial marker, and a negative correlation with the EMT composite score (ρ = –0.37, p = 0.097). This opposite trend highlights a potential antagonistic role between PAX1 and PAX2 in EMT dynamics, where PAX1 may promote mesenchymal traits while PAX2 preserves epithelial identity. Hierarchical clustering of correlation coefficients highlighted distinct profiles for PAX5 and PAX6, which clustered more closely with immune evasion, EMT and cell death signatures. Moreover, several PAX genes displayed significant co-expression with one another (Figure 2, Figure 3 and Figure 4), suggesting shared transcriptional regulation or involvement in coordinated oncogenic pathways.
2.6. Immunohistochemical Analysis
Immunohistochemical analysis revealed distinct expression patterns of PAX family members in breast cancer (Figure 5). PAX1 showed strong cytoplasmic positivity in most tumor samples, with a few cases also displaying nuclear staining (Figure 5A). PAX2 was consistently expressed in the nucleus with very strong positivity, while many cases also showed faint cytoplasmic staining (Figure 5B). PAX3 was detected only in the cytoplasm, with moderate staining intensity (Figure 5C). PAX4 (Figure 5D), PAX8 (Figure 5H), and PAX9 (Figure 5I) exhibited variable expression, with both nuclear and cytoplasmic positivity. Cytoplasmic localization of PAX proteins may reflect dysregulation or post-translational modifications affecting nuclear import or protein stability. Since PAX proteins are transcription factors, nuclear staining is generally associated with transcriptional activity, whereas the presence in both compartments could indicate shuttling, partial activation, or heterogeneous distribution within tumor cells.
No immunohistochemical positivity was observed for PAX5 (Figure 5E), PAX6 (Figure 5F), and PAX7 (Figure 5G) in tumor cells. However, PAX5 was strongly expressed in tumor-infiltrating lymphocytes (Figure 5E). The absence of staining for some PAX proteins may be due to post-transcriptional regulation, preventing translation, or to protein levels below the detection threshold of immunohistochemistry.
The association between RNA and protein expression plays a critical role in the development and interpretation of molecular signatures (Figure 6A). mRNA levels do not always correlate directly with protein abundance due to post-transcriptional, translational, and post-translational regulatory mechanisms. As a result, integrating both RNA and protein expression data can provide a more accurate and functionally relevant representation of cellular states. Moreover, mRNA and protein expression levels may serve as distinct biomarkers, depending on the biological or clinical context. In our cohort no significant correlation between mRNA and immunohistochemical signals of PAXs family members has been observed. However, the two approaches could be employed in a complementary manner to identify distinct prognostic biomarkers, reflecting different layers of tumor biology.
2.7. Prognostic Value of PAX Family
Transcriptomic data have been used to evaluate the ability of individual PAX genes, as well as their combined expression profiles, to predict lymph node metastasis in breast carcinomas (Figure 6A). Among the genes analysed by RNA-seq, PAX7 and PAX2 were identified as the most informative predictors (AUC 0.78): PAX7 was positively associated with nodal metastasis, whereas PAX2 showed an inverse correlation (Figure 6B).
3. Discussion
Through an integrative multi-level approach, this study highlights a subset of PAX genes as central regulators of cancer hallmarks, including cell death, immune evasion, and epithelial plasticity, in breast cancers. Notably, a specific PAX gene expression signature demonstrated an exceptional ability to predict lymph node metastasis, positioning these factors not only as potential molecular drivers of tumor dissemination, but more importantly, as promising biomarkers of tumor progression. Specifically, a two-gene RNA-seq model based on high PAX7 and low PAX2 expression yielded exceptional predictive accuracy (AUC-ROC = 0.78). These findings not only expand the oncogenic landscape of PAX genes but also lay the foundation for their integration into prognostic frameworks and future therapeutic stratification.
TCGA datasets uncovered distinct survival associations across PAX genes, with high expression of PAX3 and PAX6 correlating with poorer overall survival.
At the molecular level, the expression of PAX3, PAX5 and PAX8 demonstrated significant associations with immune checkpoint regulators, such as PD-L1 and TIGIT, suggesting a putative role in immune evasion. PAX5, in particular, also exhibited broad immunological correlations, including positive associations with B cell signatures and NK cell markers [30,31]. These results suggest a potential immunomodulatory function for PAX5, echoing emerging evidence of transcription factors shaping the immune microenvironment in solid tumors. This evidence is particularly important given that breast cancer is generally considered a low-immunogenic tumor. However, PAX5 expression may help identify a subgroup of breast cancer patients who could benefit from immunotherapy, mainly targeting specific immune checkpoints [32,33]. In a recent paper it has been demonstrated that a combined therapy with TIGIT and PD-1 inhibitors have a synergic effect on breast cancer model opening new perspectives for patients [34]. PAX1 appears to promote a mesenchymal transcriptional program characterized by extracellular matrix remodelling and increased cellular motility, in line with its strong associations with FN1, ZEB1, and MMP2. In contrast, PAX2 shows a robust positive correlation with CDH1 and an inverse association with the EMT composite score, suggesting a role in maintaining epithelial integrity. This antagonistic behavior positions PAX1 and PAX2 on opposite ends of the EMT spectrum, reflecting a potential transcriptional switch that modulates epithelial plasticity in breast cancer. In the context of cell proliferation and cell death, the observed associations between PAX1 and PAX9 and key apoptotic mediators such as BARD1 and BAX suggest that these transcription factors participate in the fine-tuning of cell death programs within the tumor microenvironment. Rather than acting as simple inducers or repressors of apoptosis, PAX proteins may modulate the dynamic equilibrium between survival and programmed cell death, thereby influencing tumor cell adaptability. The positive correlation between PAX5 and BAX further supports a broader role for PAX family members in apoptotic regulation, potentially through transcriptional control of pro-death genes [35]. Interestingly, the concomitant association of PAX9 with hypoxia and proliferation pathways indicates that apoptotic regulation by PAX factors is tightly integrated with stress adaptation and growth-promoting mechanisms.
Immunohistochemistry analysis revealed distinct subcellular localization patterns. Nuclear expression, expected for transcription factors, was prominent for PAX2, while PAX1, PAX3, PAX4, PAX8, and PAX9 displayed varied cytoplasmic and nuclear distribution. Such heterogeneity may reflect post-translational modifications, altered nuclear import mechanisms, or compartmental shuttling, raising important questions regarding the functional status of cytoplasmic PAX proteins. This may reflect translational repression, rapid protein turnover, and aligns with increasing recognition that mRNA levels do not consistently predict protein abundance. In fact, comprehensive clinical proteogenomic analysis highlighted that RNA and protein data often diverge in tumors, and that integration of both is critical for accurate phenotypic and biomarker interpretation [36,37,38].
The presence of cytoplasmic PAX proteins, typically nuclear transcription factors, suggests mislocalization or post-translational modification, such as phosphorylation and acetylation, that impedes nuclear import. This phenomenon has been reported for other transcription factors and may signal altered functional states. For instance, cytoplasmic localization of RUNX3 in gastric and breast cancers results in loss of its tumor-suppressive function, while β-catenin requires nuclear accumulation to mediate oncogenic Wnt signalling [39,40,41]. By analogy, the exclusive cytoplasmic localization of PAX3 observed in our breast cancer cohort likely reflects impaired nuclear import, limiting its transcriptional capacity and pointing to functional inactivation due to a different localization. Notably, PAX5, PAX6, and PAX7 were undetectable at the protein level, despite measurable RNA expression.
The complex and significant association between PAX genes and fundamental hallmarks of cancer lay the foundation for innovative therapeutic strategies based on the direct or indirect modulation of PAX family members. In fact, given their transcriptional nature, PAX proteins represent challenging but potentially high-yield targets. Although direct pharmacological inhibition of PAX factors remains elusive due to intrinsic disorder and nuclear localization [14], several indirect targeting strategies are emerging. For instance, the compound BG-1 has been reported to disrupt PAX2–histone methyltransferase interactions, leading to impaired transcriptional activity and reduced proliferation in renal carcinoma cells [16]. Similarly, HDAC inhibitors such as Entinostat and SAHA have been shown to suppress the oncogenic activity of PAX3:FOXO1 fusion proteins in rhabdomyosarcoma models [42,43].
Beyond indirect inhibition, efforts in structure-based virtual screening have identified small molecules (e.g., EG1) capable of impairing PAX DNA-binding function, providing proof of concept for direct interference with PAX transcriptional activity [44,45]. Biolayer interferometry (BLI) and surface plasmon resonance are emerging as useful platforms for drug screening against recombinant PAX proteins, allowing identification of lead compounds based on binding affinity and functional suppression in luciferase assays [45].
Additionally, oligonucleotide-based approaches, including antisense oligonucleotides (AONs) and RNA interference, offer another strategy to modulate PAX expression [46]. While delivery and stability remain significant hurdles, rapid progress in oligonucleotide chemistry (e.g., morpholino and PEG-conjugated compounds) may enhance clinical feasibility [47].
Despite the strength of an integrative multi-omic design and validation at both transcriptomic and protein levels, this study has limitations. The relatively small sample size of our institutional cohort may limit the generalizability of the findings. Moreover, the retrospective nature of the study may introduce potential biases related to patient selection, sample preservation, and data completeness. Although integration with external datasets such as Indivumed and TCGA partially mitigates these constraints, prospective validation in larger, independent cohorts will be essential.
4. Methods
4.1. Sample Collection
A total number of 21 samples of breast tumors samples were collected. All tissue samples were used for histological, immunohistochemical and molecular investigations. The study protocol was approved by the Institutional Ethical Committee of the “Policlinico Tor Vergata” (reference number # 96-19, 17 July 2019). All experimental procedures were conducted in accordance with the Code of Ethics of the World Medical Association, specifically the Declaration of Helsinki.
4.2. Construction of TMA
For the construction of the tissue microarray (TMA), 21 breast cancer cases were selected. Histological slides were cut from archival FFPE tumor blocks, stained with hematoxylin and eosin (H&E), and independently evaluated by two pathologists to assess tissue quality and tumor representativeness. H&E-stained slides were first digitized to enable objective identification of regions of interest.
TMA construction was subsequently performed using the fully automated TMA Grand Master system (3DHISTECH, Budapest, Hungary). Through the associated control software, digital slide images were accurately matched to the corresponding donor blocks and used to guide tissue sampling. A single 1.5 mm-diameter tissue core per case was automatically extracted based on predefined annotations and precisely arrayed into the recipient block.
From the completed TMA block, 3 µm-thick sections were cut, stained with H&E, and rescanned to verify sample integrity and alignment. For downstream digital image analysis and immunohistochemical quantification, a standardized annotation area of 1.22 mm^2^ was defined within each TMA spot using calibrated image-analysis software. This region of interest represents a fixed analytical area, independent of the original core diameter, and was applied uniformly across all samples to ensure consistent and comparable quantification.
4.3. Immunohistochemistry
Immunohistochemical analyses were conducted to investigate the expression of all the nine PAX proteins in breast cancers. Briefly, TMA sections were subjected to antigen retrieval by treating them with EDTA citrate pH 7.8 (PAX1, PAX3 and PAX9) or EDTA citrate pH 6.0 (PAX2, PAX4, PAX5, PAX6, PAX7 and PAX8) at 95 ◦C for 30 min. Subsequently, the sections were incubated with the following antibodies: a rabbit polyclonal anti-PAX1 antibody (dilution 1:100, ab203065, AbCam, Cambridge, UK), a mouse monoclonal anti-PAX2 antibody (dilution 1:500, clone PAX2/1104, A2522739, Antibodies.com, Cambridge, UK), a mouse monoclonal anti-PAX3 antibody (dilution 1:250 clone PAX3/8426, A316851 Antibodies.com, Cambridge, UK), a rabbit polyclonal anti-PAX4 antibody (dilution 1:250, PA1108, Termo Fischer, Cambridge, UK), a mouse monoclonal antibody anti-PAX5 (pre-diluted, clone PAX5-L, NCL-L-PAX-5, Leica), a mouse monoclonal antibody anti-PAX6 (dilution 1:100, clone AD2.38, ab78545, Abcam, Cambridge, UK), a mouse monoclonal antibody anti-PAX7 (dilution 1:500, clone PAX7/1187, ab218472, Abcam, Cambridge; UK), a mouse monoclonal anti-PAX8 antibody (pre-diluted, clone MRQ-50, MRQ-50 Mab, Leica) and a rabbit monoclonal anti-PAX9 antibody (dilution 1:100, clone S6MR, STJ11103106, St Jhons’s Laboratory Ltd., London, UK) for 1 h at room temperature. Washing was performed using PBS/Tween20 pH 7.6. The reactions were visualized using the HRP-DAB Detection Kit (UCS Diagnostic, Rome, Italy). Immunostained TMA slides were digitized using a Pannoramic Midi II Rx scanner (Epredia, Portsmouth, New Hampshire, USA) and analyzed with AI-assisted image analysis software (SlideViewer Quant Center; 3DHISTECH, Budapest, Hungary). Immunoreactivity was quantified by counting the number of positively stained cancer cells within a predefined annotation area of 1.22 mm^2^ for each TMA spot. The software enabled accurate identification and classification of positive cells. Negative (no primary antibody incubation) and positive controls have been performed for each reaction.
4.4. Nucleic Acid Extraction and Quality Assessment
Frozen tissue fragments were lysed in sample buffer supplemented with β-mercaptoethanol and mechanically homogenized using the BeadBug system. Nucleic acids were isolated simultaneously from each preparation using the Qiagen AllPrep Universal Kit (QIAGEN, Hilden, Germany) following the manufacturer’s protocol. DNA and RNA yields were quantified via Qubit fluorometry (dsDNA BR or RNA BR assays, respectively). Quality assessment was performed on an Agilent Tapestation using the Genomic DNA and High-Sensitivity RNA ScreenTape kits (Agilent Technologies, Santa Clara, CA, USA). RNA samples were considered suitable for library preparation only if exhibiting a RIN ≥ 4 or DV200 ≥ 60.
4.5. Library Preparation and NGS Sequencing
Whole-genome sequencing (WGS) libraries were generated using the PCR-free KAPA Hyper Prep Kit (Roche, Basel, Switzerland). For transcriptomic profiling, ribosomal RNA was removed with the Ribo Zero Kit (Illumina, San Diego, CA, USA) prior to library construction using the TruSeq Stranded Total RNA Kit (Qiagen, Hilden, Germany). All procedures were carried out following the manufacturers’ guidelines. Sequencing was conducted on an Illumina NovaSeq 6000 platform with 150 bp paired-end reads. WGS achieved mean coverage ≥60× in tumor samples and ≥30× in matched normal tissues, with ≥95% genomic breadth of coverage. RNA-seq libraries yielded ≥100 million total reads, with <20% rRNA content and ≥20 million reads mapping to mRNA transcripts based on the Ensembl annotation. Ribosomal depletion targeted both nuclear and mitochondrial rRNA species.
4.6. NGS Data Processing
NGS data were aligned to the GRCh38 human reference genome. Detection and annotation of short variants in normal DNA were performed using Haplotype Caller within the GATK pipeline (GATK v4.2.6.1) [48]. Germline alterations from WGS were annotated with SnpEff [49], while somatic variants were identified through a consensus approach incorporating Mutect2, Strelka [50], VarScan [51], and SomaticSniper [52]. Structural variants were inferred using the TitanCNA [53] and DellyCNV [54] R packages. RNA-seq expression values were normalized as transcripts per million (TPM). Immune and stromal cell infiltration within tumor samples was estimated using MCPCounter algorithms [55,56]. Pathway activity scores for JAK–STAT signalling, estrogen receptor activity, TGF-β signalling, and TP53 signalling were inferred using multi-gene signatures implemented in the PROGENy bioinformatic framework [57]. The proliferation score was calculated by gene set variation analysis (GSVA) using the R package GSVA and a predefined set of 40 cell cycle-related genes (E2F5, TYMS, CCNE1, CCNA2, CCNE2, CCNF, CDC25A, CENPF, CDC45L, TOP2A, CDC6, BIRC5, CDKN3, BUB1, E2F1, BUB1B, MCM2, CCNB1, MCM6, CCNB2, NPAT, CDC2, PCNA, CDC20, SLBP, CDC25B, BRCA1, CDC25C, CDKN2C, CDKN2D, DHFR, CENPA, MSH2, CKS1, NASP, CKS2, RRM1, PLK1, RRM2, STK15) [58].
The Gene70 signature, comprising 70 genes, was used to generate a prognostic profile using the R package genefu [59]. The pan-cancer hypoxia scores were derived from the 52-gene expression signature originally developed by Buffa et al. [60], while the pan-cancer EMT score was based on the gene expression signature reported by Mak et al. [61]. IFN-gamma Score were calculated based on Ayers M. et al. [62].
4.7. Bioinformatic Analysis
A bioinformatics analysis was conducted using publicly available datasets through the UCSC Xena platform (https://xena.ucsc.edu/compare-tissue/) [22], accessed on 22 July 2025. To assess the potential prognostic value of PAX genes in breast cancers. A one-way ANOVA or Student’s t-test was performed to associate PAX genes expression in normal, tumor and metastatic samples (*** p < 0.001; **** p < 0.0001). In addition to investigating gene–gene associations, the KM tool was interrogated (https://kmplot.com/analysis/ accessed on 30 January 2026). Using a cohort of 1091 infiltrating breast carcinomas, the significant associations identified in our cohort were further evaluated [63].
4.8. Statistical Analysis
All statistical analyses were conducted using Python (v3.10). Continuous variables, including RNA-seq expression levels of PAX genes and IHC scores, were first assessed for normality using the Shapiro–Wilk test. Based on the distribution, either Spearman’s rank correlation coefficient (for non-normally distributed or mixed pairs) or Pearson’s correlation coefficient (for normally distributed pairs) was calculated to assess the strength and direction of associations. Correlation analyses were restricted to pairwise complete cases without imputation. Associations were considered statistically significant at a p-value < 0.05. Results are reported as correlation coefficients (ρ) with corresponding p-values. Visualizations were generated using annotated heatmaps to facilitate interpretation of the correlation matrices. To assess the predictive role of PAX gene expression in lymph node metastasis, we conducted a multistep analysis using RNA-seq data. Data were standardized using z-score normalization and implemented a multivariate logistic regression model with L2 regularization. The model was trained on all nine PAX genes and evaluated using 5-fold stratified cross-validation. Discriminative performance was quantified by the area under the receiver operating characteristic curve (AUC-ROC). To identify the most influential predictors, coefficients were extracted and ranked by magnitude. We additionally simulated the predicted probability of metastasis for a patient expressing high levels (75th percentile) of all PAX genes.
5. Conclusions
In conclusion, we provide an integrative multiomics analysis of the role of PAX transcription factors in breast cancer through transcriptomic and immunohistochemical approaches. These findings lay the foundation for the integration of PAX signature into prognostic frameworks and future therapeutic stratification. This is particularly relevant in the current clinical context, where, despite the advent of molecular subtyping in breast cancer, with the delineation of distinct groups defined by specific prognoses and therapeutic responses, a substantial proportion of patients still fail to benefit from standard treatments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Available online: https://gco.iarc.fr/en(accessed on 5 January 2026)
- 2Kim J. Harper A. Mc Cormack V. Sung H. Houssami N. Morgan E. Mutebi M. Garvey G. Soerjomataram I. Fidler-Benaoudia M.M. Global patterns and trends in breast cancer incidence and mortality across 185 countries Nat. Med.2025311154116210.1038/s 41591-025-03502-339994475 · doi ↗ · pubmed ↗
- 3Allison K.H. Hammond M.E.H. Dowsett M. Mc Kernin S.E. Carey L.A. Fitzgibbons P.L. Hayes D.F. Lakhani S.R. Chavez-Mac Gregor M. Perlmutter J. Estrogen and Progesterone Receptor Testing in Breast Cancer: ASCO/CAP Guideline Update J. Clin. Oncol.2020381346136610.1200/JCO.19.0230931928404 · doi ↗ · pubmed ↗
- 4Wolff A.C. Somerfield M.R. Dowsett M. Hammond M.E.H. Hayes D.F. Mc Shane L.M. Saphner T.J. Spears P.A. Allison K.H. Human Epidermal Growth Factor Receptor 2 Testing in Breast Cancer: ASCO-College of American Pathologists Guideline Update J. Clin. Oncol.2023413867387210.1200/JCO.22.0286437284804 · doi ↗ · pubmed ↗
- 5Yang X. Smirnov A. Buonomo O.C. Mauriello A. Shi Y. Bischof J. Woodsmith J. TOR CENTRE Melino G. Candi E. A primary luminal/HER 2 negative breast cancer patient with mismatch repair deficiency Cell Death Discov.2023936510.1038/s 41420-023-01650-437783677 PMC 10545677 · doi ↗ · pubmed ↗
- 6Abderrahman B. Jordan V.C. Telling details of breast-cancer recurrence Nature 201855315510.1038/d 41586-018-00399-629323320 · doi ↗ · pubmed ↗
- 7Carlino F. Solinas C. Orditura M. Bisceglia M.D. Pellegrino B. Diana A. Editorial: Heterogeneity in breast cancer: Clinical and therapeutic implications Front. Oncol.202414132165410.3389/fonc.2024.132165438469228 PMC 10926019 · doi ↗ · pubmed ↗
- 8Guo L. Kong D. Liu J. Zhan L. Luo L. Zheng W. Zheng Q. Chen C. Sun S. Breast cancer heterogeneity and its implication in personalized precision therapy Exp. Hematol. Oncol.2023123 Erratum in Exp. Hematol. Oncol. 2024, 13, 7.10.1186/s 40164-022-00363-136624542 PMC 9830930 · doi ↗ · pubmed ↗
