Immunogenomic diversity of triple-negative breast cancers in obese and non-obese black and white women
Fokhrul Hossain, Denise Danos, Jovanny Zabaleta, Xiao-Cheng Wu, Luis Del Valle, Chindo Hicks, Jiande Wu, Jerneja Tomsic, Susan Neuhausen, Yuan Chun Ding, Kavitha Mukund, Zahra Mesrizadeh, Shankar Subramaniam, Augusto Ochoa, Victoria Seewaldt, Lucio Miele

TL;DR
This study explores how obesity and race influence the genetic and immune features of triple-negative breast cancer in Black and White women.
Contribution
A novel TNBC subgroup with a unique transcriptional profile was identified, equally present in Black and White women.
Findings
Black race or obesity were not predictive of poor outcomes in TNBC patients.
A new TNBC subgroup with luminal-like and stem-cell-like transcriptional features was identified.
Tumors from Black women showed higher immune cell abundance, suggesting potential therapeutic implications.
Abstract
Racial disparities in incidence and outcomes of triple-negative breast cancer (TNBC) have been attributed to ancestry, socioeconomic factors and/or obesity. We studied 253 TNBCs from 128 Black and 125 White women. Arms were balanced for age, AJCC stage, histological grade and molecular subtypes, and differed significantly in BMI distribution, obesity, and Area Deprivation Index. We examined survival rates, whole-transcriptome RNASeq and genetic ancestry. In our sample, Black race or obesity were not intrinsically predictive of poor outcomes. TNBC molecular portraits varied with stage and biological aggressiveness, irrespective of race. We identified a novel group of TNBCs with a distinctive luminal-like and stem-cell-like transcriptional profile that were equally distributed among Blacks and Whites. Genetic ancestry was highly admixed. Immune deconvolution showed a higher abundance of…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2- —https://doi.org/10.13039/100000054National Cancer Institute
- —https://doi.org/10.13039/100000057National Institute of General Medical Sciences
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Risks and Factors · Breast Cancer Treatment Studies · Breast Lesions and Carcinomas
Introduction
Triple-negative breast cancer (TNBC) is a heterogeneous group of clinically aggressive tumors. TNBCs have a high risk of recurrence and metastasis, with limited options for targeted treatment^1–3^. Several studies suggested higher TNBC incidence and/or mortality rates among self-identified Black women compared to other racial/ethnic groups. Recent publications suggested that risk for TNBC^4–6^ and hereditary susceptibility for TNBC^7^ may be linked to West African/African ancestry. A recent study identified a 59-gene “race-specific” signature that discriminated between breast cancers from Black and White women^8^. None of these genes were associated with differential survival after controlling for multiple comparisons. However, that study included tumors of all molecular subtypes and did not focus on TNBC.
Racial/ethnic minorities are more likely to be exposed to unfavorable physical and socioeconomic environments that increase health risks^9–11^. Numerous studies suggest that chronic stress associated with an adverse neighborhood environment affects health through multiple mechanisms^12–16^. Using Louisiana Tumor Registry (LTR) data, we previously reported an association between neighborhood concentrated disadvantage index (CDI) and racial disparities in TNBC outcomes^17^.
The immune microenvironment contributes to tumor biology, transcriptional profiles and clinical outcomes. Using TCGA and METABRIC data, Liu et al. found that TNBCs had higher immune cell infiltration, including immunosuppressive cells, as well as higher expression levels of inflammation-promoting and metastasis-promoting genes, compared to non-TNBC breast cancers^18^. A recent immunogenomic study identified a tumor immune risk score (TIRS) panel containing eight potential biomarkers that stratifies high- and low-risk TNBC^19^. This study also identified four tumor immune microenvironment types (TIMTs) predictive of clinical outcomes^19^. In White women, the expression of immunologically relevant transcripts predicts good response to neoadjuvant chemotherapy (NACT)^20^. However, it is unclear whether the same transcripts predict a similar response in Black women. Martini et al. recently described ancestry-associated gene expression profiles in TNBC associated with population-based immune stroma landscapes^21^.
Obesity has been linked to increased risk of TNBC^22^ and poor response to NACT^23^. In a case-control study, including 2295 breast cancer cases (364 TNBC) and 8779 matched controls, we^24^ showed that in our sample, obesity was not associated with increased TNBC risk. Conversely, type 2 diabetes was strongly associated with TNBC risk irrespective of race or age. However, that study did not address TNBC outcomes or tumor biology. We aimed to understand the relationships between race, obesity, tumor molecular portraits, immunogenomic signatures, and clinical outcomes in Louisiana women with TNBC. We profiled the transcriptome of 253 TNBCs from 128 Black and 125 White women. Arms were balanced for age, AJCC stage, histological grade and molecular subtypes, and differed significantly in BMI distribution, obesity, and Area Deprivation Index. Genetic ancestry was highly admixed, with only 50% of Black women having ≥80% West African ancestry and 64% of White women having ≥80% Western European ancestry. We asked whether race-associated or obesity-associated differences existed in TNBC subtype distribution or gene expression profiles. Additionally, we asked whether gene expression profile differences existed based on AJCC tumor stage and clinical aggressiveness as determined by 2-year survival. Finally, we used two distinct bioinformatic deconvolution algorithms to estimate the distribution of immune microenvironment cell populations based on race, obesity, genetic ancestry, age, AJCC stage and clinical aggressiveness.
Results
Table 1 shows the basic demographic, clinico-pathological and molecular characteristics of the tumors we studied. The study sample from which our tumors were selected is shown in Supplementary Table 1.Table 1. Demographics of our sample, clinico-pathological and molecular characteristics of TNBC tumorsAllBlackWhite%n%n%n**p valueAll100.0253100.0128100.0125Age, y0.1390 Mean, std62.212.861.011.963.413.5Age0.6204 Less than 5017.44418.82416.020 50 or older82.620981.310484.0105BMI0.0045 Lean30.47721.12740.050 Obese I31.68032.04131.239 Obese II20.25124.23116.020 Obese III17.84522.72912.816Obese0.0016 No30.47721.12740.050 Yes69.617678.910160.075AJCC stage0.0960 I41.110435.24547.259 II47.011950.06444.055 III9.12312.5165.67 IV2.872.333.24Late stage0.0562 No41.110435.24547.259 Yes58.914964.88352.866Grade0.5891 12.462.332.43 211.52912.51610.413 382.220882.810681.6102 Unknown4.0102.335.672-year survival0.5812 No13.43414.81912.015 Yes84.221382.810685.6107 Censored2.462.332.43TNBC type0.1830 Basal-like 114.63713.31716.020 Basal-like 27.1187.097.29 Immuno-modulatory19.44920.32618.423 Luminal androgen receptor5.5147.094.05 Mesenchymal13.83515.62012.015 Mesenchymal stem-like16.64210.91422.428 Unclassified7.92010.9144.86 Excluded15.03814.81915.219Area deprivation index national rank<0.0001 Median (IQR), n76 (48–89)251^a^84 (67–92)126^a^56 (37–80)125BMI body mass index, AJCC American Joint Commission on Cancer, TNBC Triple negative breast cancer, IQR Inter-quartile range.^a^Area deprivation index could not be computed for 2 patients.
The two arms were not significantly different in age, AJCC stage, histological grade, 2-year survival, or molecular subtype as defined by TNBCtype^25,26^. We used the original 7-subtype Lehmann classification because the tumor microenvironment was a focus of our study. Genes expressed in the microenvironment contributed to the definition of 2 of the original subtypes^27^, and differences in NACT response among the original 7 subtypes were described^28^. The two arms differed significantly in BMI distribution and obesity rates, with Black women showing a higher prevalence of obesity and generally higher BMIs. The most significant difference between Black and White women was area-based socioeconomic disadvantage, as represented by ADI (Table 1, p < 0.0001). The prevalence of later-stage tumors (AJCC 2–4) as opposed to Stage 1 approached statistical significance (p = 0.0562).
Transcriptomic analyses
We classified tumors using the TNBCtype algorithm^26^. However, the algorithm excluded 38 tumors from analysis. These tumors were confirmed to be TNBC by IHC. Their GEPs were clearly distinct from the tumors included in the analysis by the algorithm (Fig. 1A, B and Supplementary Table 2). The 3406 differentially expressed transcripts included a Notch signaling signature (high DLL1, HES1, and DTX3, low DLL3), as well as transcripts linked to estrogen signaling (TFF1, AHR), luminal B breast cancers (POTED), VEGFA, EGFR, BRAF, TGFB1, and several cytokines (e.g., IL17, IL12B, IL21, IL18). IPA (Supplementary Table 3) identified numerous pathways distinguishing TNBC-type-excluded from TNBC-type-included tumors. These included cyclin-cell cycle pathways, estrogen signaling, Notch, WNT, Sonic Hedgehog, TGF-β, HIPPO, and other cell fate regulatory pathways.Fig. 1TNBC-excluded tumors have distinctive lumino-basal transcriptomic profiles.A Volcano plot and B Heatmap showing the subset of tumors excluded from TNBC-type analysis and characterized by a highly distinctive gene expression profile. C Volcano plot and D Heatmap showing transcripts significantly different between TNBC-type-classified and unclassified tumors in our sample.
Gene network analysis (Supplementary Table 4) identified endocrine system disorders, organismal injury and abnormalities, reproductive system disease, and developmental disorder, and others as the most significantly different networks. These tumors may represent a distinct “lumino-basal” phenotype and deserve further in-depth investigation. They were equally distributed between Black and White women. Among the 215 tumors included into the TNBCtype analysis, subtype distribution did not significantly differ between Black and White women (Table 1). However, 20/215 tumors were “unclassified” into the canonical subtypes. Of these, 14/20 occurred in Black women and 18/20 occurred in obese women. Seventy transcripts were differentially expressed in the unclassified versus classified tumors (Fig. 1C, D and Supplementary Table 5), including, among others, MYCN and immune-related transcripts such as CD3E and MICB (an NKGD2 ligand involved in NK and CD8 activation). IPA (Supplementary Table 6) identified chondroitin sulfate and dermatan sulfate degradation, telomere extension, and several immunologically relevant pathways that distinguished these tumors from classified TNBCs. Gene network analysis differences between unclassified and classified TNBCs are shown in Supplementary Table 7. As AJCC stage at diagnosis was strongly associated with survival (Supplementary Fig. 1) and Black women tended to be diagnosed at later stages, we compared GEPs of stage 1 tumors (confined to the breast and smaller than 2 cm in major diameter) to stages 2-4 tumors. One hundred forty-two transcripts were significantly different in stage 1 versus later stage tumors (Fig. 2A, B and Supplementary Table 8). Of 20 transcripts upregulated in later stage tumors, the most upregulated was LOC101928525, a lncRNA from a locus in chromosome 9q34.3 in the (-) strand of the MSRP2 gene, involved in mitochondrial metabolism^29^. Among the most significantly downregulated transcripts in later stage tumors are DEFB132 (a defensin involved in innate immunity), C14orf180 (an integral plasma membrane protein), TMEM252 (an integral plasma membrane protein), and LINC01537, an LncRNA associated with squamous carcinomas. None of the transcripts significantly up- or down-regulated in stage 1 versus later stage tumors was significantly different between Black and White women.Fig. 2. Transcriptomic differences between early- and late-stage tumors.A Volcano plot and B Heatmap showing transcripts significantly different between AJCC Stage 1 tumors and AJCC Stages 2–4 tumors.
When we compared tumors that were fatal within 2 years of diagnosis to tumors with survival > 2 years, 20 transcripts were significantly different (Supplementary Fig. 2A, B and Supplementary Table 9). Among these, tumors with survival < 2 years showed decreased expression of TMEM64, a WNT negative regulator, and increased expression of EPN1 (Epsin-1), which promotes IKKgamma ubiquitination to drive NF-kB signaling in TNBC progression^30^.
Unsupervised cluster analysis identified 26 transcripts significantly different between self-reported Black and White women (Supplementary Fig. 3 and Supplementary Table 10). Pathway analysis (Supplementary Table 11) showed that xenobiotic metabolism and glutathione redox were the most significantly different pathways, alongside other breast cancer-related pathways such as WNT, embryonic stem cell pluripotency, mitochondrial dysfunction, and breast cancer regulation by stathmin. Gene network analysis (Supplementary Table 12) showed 5 significantly different networks, including cell death and survival, lipid metabolism/small molecule biochemistry, and cell-mediated immune response, consistent with xCell results (see below). Three de-ubiquitinase genes in chromosome 8p23.1, USP17L1, USP17L4, and USP17L8, were differentially expressed between Black and White women. USP17L1 and USP17L4, which are contiguous to one another, were relatively over-expressed in Black women, while USP17L8, which is centromeric to USP17L1 and USP17L4, was relatively over-expressed in White women. Loss of heterozygosity in chromosome 8p, a region that includes genes controlling lipid metabolism^31^ is a marker of breast cancer progression^32^. Additionally, 8p23.1 chromosomal inversion is associated with poor prognosis in breast cancer^33^. FSHR, which was significantly over-expressed in White women, encodes the FSH receptor, a marker for vascular invasion and remodeling in breast cancer^34,35^. Twenty/26 transcripts, including USP17L1, USP17L4, USP17L8, and FSHR, remained significantly different when we limited our analysis to women older than 50 (104 Black and 105 White) (not shown). We observed no significant racial differences in GEPs in women younger than 50 (24 Black and 20 White), possibly due to the small sample size. Obesity did not account for these GEP differences. When we compared tumors from obese versus lean women, only 2 transcripts were significantly different (AP4E1 and KRT20, both significantly downregulated in tumors from obese women compared to lean ones). When we limited our analysis to obese women (101 Black and 75 White), 13 of the original 26 genes, including USP17L4, USP17L8, and FSHR, remained significantly different (not shown). Given the heavily admixed geographic ancestry of our cohort (see below) and the highly significant difference in ADI between the two arms, the hypothesis that these transcriptomic differences may be associated with life-long socioeconomic disadvantage deserves further investigation.
Genetic ancestry analysis
Our cohort was genetically heavily admixed, including women with significant Mediterranean, North African, Native American, Pacific Islander and Asian components, consistent with the complex history of Louisiana (Supplementary Table 13 and Supplementary Fig. 4). Using an 80% threshold for predominant ancestry, only 59/118 (50%) Black women with evaluable DNA had >80% Central/West African ancestry, 71/111 (64%) of White women with evaluable DNA had >80% Northern/Western European ancestry, and 99/229 (43%) of the evaluable women in the sample set were multi-ancestral. Using a less stringent 60% threshold, 105/118 Black women (89%) had >60% Central/West African ancestry, 2/118 (1.7%) had >60% Western/Central European ancestry, and 12/118 (11%) were multi-ancestral. Among White women with evaluable DNA, 95/111 (86%) had >60% Northern/Western European ancestry, 2 (1.8%) had >60% Central/Western African ancestry, and 10 (9%) were multi-ancestral.
Immune deconvolution
Next, we focused on tumor immune microenvironment-associated transcripts. CIBERSORTx^36^ results by race, ancestry, obesity, stage, and survival (Table 2) showed that memory B-cells were more common in Black, lean, and early-stage women, while activated mast cells were more common in Black women and women with at least 80% West African ancestry when compared to those with at least 80% European ancestry. Additionally, regulatory T cells were elevated among women with West African ancestry.Table 2CIBERSORTx results by race, ancestry, obesity status, age, stage, and 2-year survivalRaceAncestry^a^ObesityAgeStage2 y survivalp valueElevatedp valueElevatedp valueElevatedp valueElevatedp valueElevatedp valueElevatedB cells memory0.0031Black0.15090.0322Lean0.92440.0413Early0.5060B cells naive0.06810.16350.24880.0320 <500.05760.4228Dendritic cells activated0.92310.21520.45250.62990.10540.3970Dendritic cells resting0.07060.29950.24760.71840.40820.5434Eosinophils0.33800.17290.08580.27660.19700.8076Macrophages M00.80850.97390.50850.49500.25550.4253Macrophages M10.50750.99440.05810.0075<500.0018Late0.4577Macrophages M20.30170.41990.47860.0264≥500.11210.9583Mast cells activated0.0073Black0.0059AA0.16770.53800.13670.4456Mast cells resting0.24580.0230EA0.76570.92590.0006Early0.8779Monocytes0.25710.16560.64800.0367≥500.0343Early0.5435Neutrophils0.71670.44480.90840.21930.50690.6775NK cells activated0.06840.13730.52270.27220.54730.2284NK cells resting0.34710.77180.71920.86500.28320.5412Plasma cells0.86000.99590.24190.27590.11690.6123T cells CD4 memory activated0.83820.47800.65990.0399<500.0191Late0.1802T cells CD4 memory resting0.78070.47290.58430.87750.05770.1601T cells CD4 naive0.53320.15290.10910.39890.83930.5176T cells CD80.20530.76100.56410.24890.21340.0788T cells follicular helper0.91890.23670.10430.08770.0166Late0.5160T cells gamma delta0.56990.11940.91280.47200.14630.7703T cells regulatory Tregs0.27320.0453AA0.07740.71940.07420.6151AA African American, EA European American.^a^Comparisons by ancestry included only patients found to have at least 80% West African ancestry (AA; n = 59) or at least 80% European ancestry (EA; n = 71) from genetic ancestry analysis.Bold values indicate statistically significant differences.
Tumors from women <50 years of age contained more M1 macrophages and activated memory CD4 cells compared to older women, while tumors from older women contained more M2 macrophages and monocytes. Stage 1 tumors contained more memory B-cells, monocytes, and resting mast cells, while later stage tumors contained more M1 macrophages, activated memory CD4 cells, and follicular helper T-cells. CIBERSORTx analysis confirmed that the TNBC-type-excluded tumors clustered separately from the canonical subtypes, notably different for higher content in mast cells and lower content in M0 macrophages, while the TNBC-type-unclassified tumors were most similar to the IM subtype (Supplementary Fig. 5A and Supplementary Table 14). Several of the cell populations identified by CIBERSORTx were validated by IHC (Supplementary Fig. 5B). xCell analysis^37^, which uses a different logic compared to CIBERSORTx and estimates 64 stromal populations, confirmed that B-cells were enriched in tumors from Black women and women with at least 80% West African ancestry, along with plasma cells, interdigitating dendritic cells (iDC), CD4 T-cells, CD4 central memory, CD4 memory and CD8 effector memory cells, while tumors from White women were relatively enriched in M2 macrophages, resting mast cells, monocytes, neutrophils, activated dendritic cells (aDC) and conventional dendritic cells (cDC) (Table 3). B-cells, CD4 memory T-cells, CD4 naïve T-cells, CD4 effector memory T-cells, CD8 central memory and naïve CD8 T-cells were also enriched in tumors with survival >2 years, along with Th1 and Treg T-cells and plasmacytoid dendritic cells (pDC). xCell analysis also showed numerous significant differences among TNBC-type-excluded, TNBC-type-unclassified, and the canonical subtypes (Supplementary Table 15).Table 3xCELL results by race, ancestry, obesity, age, stage and 2-year survivalRaceAncestry^a^ObesityAgeStage2 y survivalp valueElevatedp valueElevatedp valueElevatedp valueElevatedp valueElevatedp valueElevatedAdipocytes0.29440.14130.49320.2295**<0.0001Early0.4604Astrocytes0.0199White0.44110.19010.48370.35480.4607B-cells0.0470Black0.0141AA0.46310.06980.21140.0113YesBasophils0.28480.19670.52450.54500.07160.9762CD4+ T-cells0.0040Black0.0019AA0.35170.48150.41230.0612CD4+ Tcm<0.0001Black0.0185AA0.29130.90340.0233Early0.9322CD4+ Tem0.96440.0412AA0.62150.10660.78090.0344YesCD4+ memory T-cells0.0017Black0.0002AA0.0376Obese0.0092**<500.0008Late0.0057YesCD4+ naive T-cells0.83690.22660.89720.57810.0107Early0.0285YesCD8+ T-cells0.78850.19540.68760.43010.78590.1527CD8+ Tcm0.71350.20950.88980.0303<500.20710.0449YesCD8+ Tem0.0083Black0.0003AA0.35650.51870.44410.0978CD8+ naive T-cells0.05410.0018AA0.24070.0317<500.0275Late0.0331YesCLP**<0.0001Black<0.0001AA0.11990.0440**<50**<0.0001Late0.8666CMP<0.0001White0.0025EA0.84840.84380.0041Early0.8647Chondrocytes0.81270.79440.09180.0287**≥500.0010Early0.4749Class-switched memory B-cells0.98810.38870.60550.29770.55540.0993DC**<0.0001White0.0026EA0.33680.32450.09470.7862Endothelial cells0.0008Black0.06050.42000.91510.0004Early0.7182Eosinophils0.69680.42710.93160.33930.0150Early0.2153Epithelial cells0.64260.96830.92190.86770.0039Late0.0281NoErythrocytes0.15160.36200.34860.51560.08990.5713Fibroblasts0.0222White0.05930.83830.2714<0.0001Early0.7460GMP0.07850.36200.25010.46450.79070.3300HSC0.20050.67040.41670.89000.0002Early0.1477Hepatocytes0.86010.30310.21940.0340**≥500.0004Early0.0664Keratinocytes0.23980.85340.49020.90710.0464Late0.0402NoMEP**<0.0001White<0.0001EA0.09590.19110.07220.4020MPP0.31161.00000.50830.64640.23130.6895MSC<0.0001White0.0001EA0.0296Lean0.58760.32410.3087Macrophages0.83650.18940.0326Obese0.17940.93530.1432Macrophages M10.33740.07860.0279Obese0.11270.14400.3737Macrophages M20.0131White0.14250.40770.06920.0007Early0.6102Mast cells<0.0001White0.0022EA0.38070.12260.13270.9563Megakaryocytes0.0083White0.0019EA0.20480.3152<0.0001Early0.5062Melanocytes0.22990.77290.23750.38200.85710.1712Memory B-cells0.05670.0120AA0.56360.0072**<500.19950.0537Mesangial cells0.0294Black0.15740.52700.38800.0011Early0.2529Monocytes0.0099White0.13750.58150.35040.07800.2241Myocytes**<0.0001White0.0007EA0.91940.06550.12770.7261NK cells0.06000.22570.34400.41580.14610.2846NKT0.68360.27560.07850.81840.52670.3579Neurons0.47780.92730.99250.09770.0001Early0.9166Neutrophils0.0166White0.0384EA0.86600.16190.38840.0778Osteoblast0.0137Black0.0313AA0.06570.21270.0001Late0.1221Pericytes0.0245White0.0106EA0.08760.13350.0205Early0.3233Plasma cells0.0243Black0.0138AA0.92890.0389**<500.0002Late0.1751Platelets0.17910.09740.74720.31840.0045Early0.9650Preadipocytes0.0119White0.0040EA0.48570.2695**<0.0001Early0.3783Sebocytes0.0016White0.0052EA0.73840.22480.0040Late0.0407NoSkeletal muscle<0.0001White0.0019EA0.19650.50090.0005Early0.6522Smooth muscle<0.0001Black<0.0001AA0.14610.06590.0082Late0.8181Tgd cells0.25340.0224AA0.16570.0663<0.0001Late0.0212YesTh1 cells0.32800.23520.24540.77770.0006Late0.1749Th2 cells0.45270.89200.77920.0228**<50**<0.0001Late0.7094Tregs0.55390.38070.52970.94010.40980.0197YesaDC<0.0001White<0.0001EA0.63480.45870.77030.1618cDC0.0149White0.13930.57410.37280.0093Early0.6054iDC0.0132Black0.0421AA0.13140.56500.05580.7121Lymphatic endothelial cells<0.0001Black0.0005AA0.11300.82710.0019Early0.9669Microvascular endothelial cells0.27910.74100.17890.76600.0001Early0.6743Naive B-cells<0.0001Black<0.0001AA0.65320.0167**<500.32500.1743pDC0.91740.0276AA0.25400.13030.06180.0085Yespro B-cells0.96860.75910.16630.61340.0022Late0.2921ImmuneScore0.67500.10460.10250.12400.84150.0609StromaScore0.80380.98510.88270.4835**<0.0001Early0.5676MicroenvironmentScore0.86630.60370.29270.61880.0008**Early0.1116AA African American, EA European American.^a^Comparisons by ancestry included only patients found to have at least 80% West African ancestry (AA; n = 59) or at least 80% European ancestry (EA; n = 71) from genetic ancestry analysis.Bold values indicate statistically significant differences.
The most consistent immune microenvironmental difference identified by both algorithms was higher B-cell lineage infiltration in Black women. Tumor-infiltrating B-cells (B-TIL) are a favorable prognostic indicator in breast cancer^38^. Tumor-infiltrating B- and T-cells are a positive prognostic indicator in TNBC^39^.
Survival analysis
Supplementary Fig. 1 shows a highly significant association between stage at diagnosis and survival in our sample, consistent with our previous observations^17^. Due to the magnitude of the effect of stage on survival, additional associations with survival were performed via Cox proportional hazard models, controlling for stage as a fixed covariate. Results are reported as Supplementary Table 16. Race was not significantly associated with survival (p = 0.094), but obese I and II cases had significantly greater survival compared to lean cases (p < 0.05). Additionally, there were significant differences in survival by TNBCtype, with Immuno-modulatory having the greatest rates of survival (p < 0.05), consistent with the literature^40,41^. The naïve B cell population in the tumor microenvironment (CIBERSORT) was associated with greater survival (p < 0.05) but not the memory B cell population (p > 0.05).
Discussion
We compared GEPs in TNBC from Black and White women with TNBC from Louisiana. The study arms differed significantly in BMI, obesity, and area deprivation index (ADI). The limited GEP differences we observed between TNBCs from Black and White women were not linked to stage, survival, or obesity. The relative roles of socioeconomic exposures and ancestry admixture in the GEP differences we observed remain to be determined. In our sample set, race was not associated with differences in survival, while obesity (as determined by BMI) was associated with longer survival. AJCC stage was a strong predictor of survival. The relationship between BMI and TNBC survival is poorly understood, with some studies showing no association^42,43^ and others an association with poor prognosis^44,45^. However, an “obesity paradox” has been proposed for TNBC, whereby obesity may promote response to immunotherapy by modulating the immune microenvironment^46^. In our sample, CD4 memory T-cells (Tcm) were associated with Black race, African ancestry, obesity, late stage and survival over 2 years. The relationship between the immune microenvironment, BMI, race and ancestry deserves further investigation.
Our analysis revealed a subset of 38 immunohistochemically confirmed TNBCs (<1% ER and PR expression) that were excluded from TNBC-type analysis and had vastly different transcriptomic features from TNBCs-included tumors. These tumors appear to have a “lumino-basal” transcriptomic signature, with estrogen signaling and Notch activity signatures. Haughian et al.^47^ showed that over 50% of primary ER(+)PR(+) breast cancers contain an ER(-)PR(-)CK5(+) “lumino-basal” subpopulation, which is driven by Notch signaling and shows phenotype and GEP matching those of TNBC. Whether this subset of tumors originate from initially ER(+) pre-neoplastic lesions through phenotypic plasticity and/or loss of ER expression in luminal progenitor cells, and whether these tumors are sensitive to Notch inhibition, remains to be established.
The molecular profiles of tumors varied significantly with AJCC stage irrespective of race, indicating that tumor progression is associated with remarkable transcriptomic and immunological changes. Rapidly fatal (<2 year) tumors also had distinctive molecular profiles irrespective of race.
We observed immunogenomic diversity based on race, ancestry, age, stage, obesity, and survival. Our xCell results (Table 3) are generally consistent with those of Martini et al.^21^, who showed that several immune populations, including B-cells were more abundant in TNBC from women of West African ancestry. Our results confirm these findings and indicate an association between B-cell infiltration and improved survival. Intratumoral B-cells promote optimal anti-tumor T-cell responses and are thought to contribute to the efficacy of immunotherapy^48^. The enrichment in cDC and aDC in tumors from White women suggests that immunotherapies harnessing DCs^49^ may be of interest in this group of patients. In our heavily admixed cohort, the possible roles of different ancestries and their interactions with the exposome in modulating gene expression will require additional studies. Our cohort also differs from that studied by Martini et al.^21^ in that our women were from a single US state (Louisiana), and the two arms of the study were demographically and clinico-pathologically matched except for BMI and ADI.
Limitations of our study include the exclusion of overweight patients to maximize statistical power to detect differences between lean and obese patients, the potential for selection bias, and the use of retrospective registry data. Our sample size did not permit an analysis of different degrees of obesity (I, II, or III). Supplementary Table 17 shows the demographic distribution of our cases across the obesity spectrum. Importantly, BMI does not necessarily provide information on the metabolic status of patients^50^. Future studies should evaluate the relationship between metabolic health and TNBC prognosis.
Taken together, our findings suggest that while biological/immunological differences with potential precision oncology implications may exist between TNBCs occurring in women who self-identify as either Black or White, these self-identified classifiers do not necessarily predict poorer survival outcomes in Black women—which are more likely to be due to access to adequate care and/or comorbidities.
Our findings highlight the need for a refined classification of TNBCs that considers socioeconomic factors, genetic admixture, stage, the phenotypic plasticity of tumor cells, and the heterogeneity of the tumor microenvironment, to better predict clinical outcomes.
Methods
Study design and data collection
We collaborated with the Louisiana Tumor Registry (LTR) and the SEER-Linked virtual tumor repository program to obtain de-identified formalin-fixed, paraffin-embedded (FFPE) tissue and clinical data for TNBC cases in Louisiana. LTR acquired and de-identified FFPE tumor blocks under a protocol deemed “exempt” by the LSUHSC IRB. The target population was TNBC cases diagnosed from 2010 to 2016 (Supplemental Table 1). Clinico-pathological parameters collected included: Age at diagnosis, race (self-reported), family history of breast cancer (if known), height, weight, diagnosis of diabetes (Y/N, if known), histological type, histological grade, of ER, PR, Her2/Neu, Ki67 (if available) and any other immunohistochemistry markers studied (e.g., CK5/6, c-Kit), AJCC stage at diagnosis, and recurrence (Y/N). Body mass index (BMI) (kg/m^2^) around the time of diagnosis was calculated from height and weight. BMI was classified as lean (18.5 ≤ BMI < 25), obese class I (30 ≤ BMI < 35), obese class II (35 ≤ BMI < 40), and obese class III (BMI ≥ 40).
Inclusion criteria were: Female breast cancer; ER, PR negative (<1% by immunohistochemistry), absence of HER2-amplification by immunohistochemistry (IHC) and/or fluorescence in situ hybridization (FISH); Age at diagnosis ≥ 18; Race: Black or White; Any AJCC stage (0, I II, III, IV); BMI lean (BMI < 25) or obese (BMI > 30); Documented surgery (lumpectomy or any partial or total mastectomy); documented NACT (Y/N). We excluded underweight (BMI < 18.5) and overweight (25 ≤ BMI < 30) women to maximize statistical power to detect differences between lean and obese women. These criteria identified 1123 cases (Supplemental Table 2).
The study design included two races (Black, White) and 4 obesity categories (lean, obese I, obese II, obese III). From the sampling frame, tissues were requested from 3 statewide pathology laboratories to establish a stratified random sampling of each race and obesity category, targeting up to 50 cases per combination. From these tissue requests, we obtained tissue from 256 cases. Three cases were censored. Two were women with bilateral tumors, and one was found to be incorrectly identified as TNBC. The final sample included 253 TNBC cases. Area Deprivation Index (ADI)^51^ was calculated from census tract data available for 251 cases.
Immunohistochemistry
Sections of FFPE tissue containing ≥ 50% tumor, as judged by the study pathologist, were cut and processed for immunohistochemistry and RNA extraction. Formalin-fixed, paraffin-embedded tissues were microtome-sectioned to a thickness of 4 μm, placed on electromagnetically charged slides (Fisher Scientific; Waltham, MA), and stained with Hematoxylin & Eosin (H&E) for routine histologic analysis. Immunohistochemistry was performed using Avidin-Biotin-Peroxidase, according to the manufacturer’s instructions (Vectastain Elite ABC Peroxidase Kit; Vector Laboratories, Burlingame, CA). Our modified protocol includes melting the paraffin at 56 °C for 15 min, clearing in xylenes 3 times for 15 min each, rehydration through descending grades of alcohol up to water, non-enzymatic antigen retrieval with 0.01 M sodium citrate buffer pH 6.0 at 95 °C for 25 minutes in a vacuum oven, and endogenous peroxidase quenching with 3% H_2_O_2_ in methanol, Following these steps, slides were washed with PBS and blocked in PBS/0.1% BSA containing 5% normal horse serum (for mouse monoclonal antibodies), or normal goat serum (for rabbit generated antibodies) for 2 h at room temperature, then incubated overnight with primary antibodies. Primary antibodies included rabbit recombinant anti-CD-3 (clone SP162, 1:150 dilution, Abcam, Fremont, CA), anti-CD-8 α (clone EPR20305, 1:200 dilution, Abcam), and an anti-FoxP3 (clone EPR15038-69, 1:50 dilution, Abcam), and mouse monoclonal anti-IL-17A (clone 4K5F6, 1:20 dilution, Abcam) and anti-CD20cy (clone L26, 1:100 dilution, DAKO, Santa Clara, CA). Breast Cancer specific biomarkers were tested with rabbit recombinant anti-estrogen Receptor alpha (clone EPR4097, 1:250 dilution, Abcam), anti-Progesterone Receptor (clone YR85, 1:250 dilution, Abcam), and anti-ErbB2/HER2 (clone SP3, 1:100 dilution, Abcam). The following day, slides were incubated with biotinylated secondary anti-rabbit or anti-mouse antibodies for 1 h at room temperature, developed using a diaminobenzidine substrate, counterstained with hematoxylin, and mounted with Permount. Images were collected at 200x and 600x magnification using a BX61 Olympus microscope equipped with a high-resolution Olympus DP74 digital camera and CellSense image capture software.
RNA extraction and library preparation
Total RNA was extracted from FFPE tissue samples using the truXTRAC FFPE total NA (column) kit (Covaris). Directions from the manufacturer were followed with a few modifications. Tissue sections of 20 μm were first deparaffinized using Sigma’s xylene substitute, followed by two ethanol washes and air dried for 20 min at 37 °C. After deparaffinization, tissues were lysed in lysis buffer containing proteinase K using the Covaris M220 focused ultrasonicator and following the recommended settings by the manufacturer, followed by an incubation at 56 °C for 30 min. RNA-containing supernatant was then de-crosslinked at 80 °C for 20 min and purified using the RNA purification columns as indicated by the manufacturer’s instructions. A treatment with DNase I (Qiagen) was included. RNA was eluted in 30 μl of H_2_O. RNA quantification was performed using the Qubit RNA HS Assay kit (Invitrogen) and RNA quality was assessed with the Agilent 2100 bioanalyzer (Agilent Technologies).
Libraries were generated using Illumina’s TruSeq RNA exome library preparation kit and according to manufacturer’s instructions, with the following modifications^52^: 300 ng of input RNA was used, PCR cycles were reduced to 9 cycles, and hybridization time was extended to 16 h. After library preparation, free adapters were blocked using Illumina’s free adapter blocking reagent. Libraries were validated on Agilent’s 2100 bioanalyzer (Agilent Technologies) using a DNA 1000 or HS DNA Kit and quantified using the Qubit dsDNA HS Assay kit (Invitrogen). For sequencing, libraries were pooled in equimolar ratios and run on Illumina’s NextSeq 500 instrument using a NextSeq 500 high-output v2 kit with pair-end 75 bp reads.
RNA sequencing
The FASTQ files were obtained from Illumina’s BaseSpace and were uploaded to Partek Flow. Contaminants were removed with Bowtie 2.2.5, aligned to STAR 2.6.1 d using hg38 as reference, and quantified with RefSeq 96 (release 2020-11-02). Raw counts were downloaded for differential expression analysis. For cell deconvolution and enrichment, the CIBERSORTx^36^ and xCell^37^ algorithms were used.
Bioinformatics analysis
Data consisting of raw sequence reads generated using RNA-Sequencing of 258 samples was transferred from the Translational Genomics Core to the Center for Bioinformatics Services. Gene expression data were quantified using Transcripts Per Kilobase Million (TPM) and were log2 (TPM+1) transformed. Patients with bilateral tumors were removed to avoid potential confounding of the results by hereditary cancer syndromes. After data preprocessing, we generated a dataset consisting of 253 samples and 28,278 genes. Gene expression data was processed and normalized using loess normalization method implemented using Bioconductor R-package LIMMA^53^. The gene expression data matrix was filtered to remove transcripts with missing data and very low expression values, as the values were noise and not informative. After data preprocessing, we generated a dataset containing 19,361 genes consisting of 128 samples from self-identified Black/African American (AA) women and 125 samples from self-identified White/European American (EA) women, which were used in the analysis. The probe IDs, gene symbols, and names were matched for interpretation using the Ensemble database, a database used for gene annotation of RNA-sequencing experiments. Using normalized data, we performed whole transcriptome analysis comparing gene expression levels between Black and White women using a paired t-test implemented in the R-package LIMMA^53^, to discover a signature of significantly differentially expressed genes distinguishing Black from White women. In addition, we performed differential expression by various variables and self-reported race. We compared gene expression at age ≥ 50, Black vs White, obese versus non-obese overall and Black vs White, stage 2-4 vs stage1, stage 2-4 Black vs White, and mortality >2 year vs ≤2 year. In addition, we performed differential expression analysis comparing 215 TNBC-type “included” versus 38 “TNBC-type “excluded” tumors, and 195 TNBC-type- included “classified” samples vs 20 TNBC-type-included “unclassified” samples. Throughout these analyses, we used the false discovery rate (FDR) procedure^54^ to adjust p-values for multiple hypothesis testing. Additionally, we computed the log2 Fold Change (Log2FC) defined as the median of tumors minus median of normal for each gene. Genes were ranked based on adjusted p-values, FDR, and LogFC. We performed unsupervised analysis using hierarchical clustering using the Pearson correlation coefficient as the distance measure between pairs of genes to characterize the patterns of expression profiles of identified sets of genes.
We performed network and pathway analysis using the Ingenuity Pathway Analysis (IPA) software package^55^. Using IPA, we mapped the set of significantly differentially expressed genes distinguishing comparison groups. We computed probability Z-scores and log p values to assess the likelihood and reliability of correctly assigning the genes to the correct molecular networks and signaling pathways, respectively. FDR was used to correct for multiple hypothesis testing in pathway analysis. The predicted molecular networks and biological pathways were ranked based on z-scores and log P values, respectively. Gene ontology (GO)^56^ analysis, as implemented in IPA, was used to classify the genes according to the molecular functions, biological process, and cellular components in which they are involved.
Genetic ancestry analysis
DNA samples were first checked for quality using the Illumina Infinium HD FFPE QC Kit, following vendor recommendations. All evaluable samples with sufficient DNA passed the QC test, and 200 ng of DNA were used to interrogate 1.8 million SNPs using the Illumina Global Diversity Array (GDA). Briefly, samples were amplified at 37 °C for 24 h, fragmented at 37 °C for 1 h, precipitated, and resuspended in buffer. Samples were hybridized to beadchips for 18 h at 48 °C. The beadchips were washed to remove unhybridized and nonspecifically hybridized DNA, and samples were subjected to a single-base extension at 44 °C and staining at 32 °C. The beadchips were washed and scanned in the Illumina iScan system. The IDAT files (green and red fluorescence) were downloaded and used for allele calling.
Global ancestry was inferred using a total of 163 ancestry-informative single nucleotide polymorphisms (SNPs) (AISNP) published by Pakstis et al.^57^. From the Illumina Genome Diversity Array, we extracted the plus-string genotypes of the 163 AISNPs for the 233 samples. Two SNPs with over 90% missing genotype were excluded. Genetic ancestry proportion was calculated using the STRUCTURE software 2.3.4^58^. under the admixture model assuming correlated allele frequencies. Based on the genotypes of 161 AISNPs, we first determined the optimal number of clusters that best fits the 1296 reference samples from 22 diverse human populations^57^ by running the program 20 times at each K level from K = 7 to K = 14 with 10,000 burn-in and 10,000 Markov Chain Monte Carlo (MCMC) iterations. Based on the optimum k-value of 11, we ran unsupervised structure analysis on the combined data from 1296 reference and 321 test samples. Because one of the 11 clusters was specific to the TNBC test samples, we also conducted supervised structure analysis as follows: 672 reference samples were selected from 8 predefined ethic groups where the majority of samples in each predefined group had high ancestry proportion (>0.9) in one of 11 clusters in the initial unsupervised analysis. The STRUCTURE program then grouped the test samples based on the pre-defined population-of-origin. In order to examine reliability of ancestry inference in the current supervised analysis, we included an additional 88 test samples with results generated using unsupervised Structure analysis in order to assess variation between the supervised and unsupervised analyses.
The overall correlation between results from the supervised and unsupervised was very high (0.68 to 0.97 across 8 different clusters), and concordant with self-reported race and ethnicity in the additional 88 test samples. Therefore, we are confident that the ancestry proportions for these 233 test samples are appropriate. In addition, we used principal components analysis, calculated using PLINK 1.946^59^, as a complementary method to assess ancestry.
Statistical analysis
Group comparisons of age at diagnosis were assessed using ANOVA. Categorical variables were compared using Fisher’s exact tests. Overall survival was visualized via Kaplan Meier plots and compared using log-rank tests and Cox proportional hazards models. Cell population data (CIBERSORTx^36^, xCell^37^) were visually summarized using heatmaps of the median values, and compared using non-parametric Kruskal–Wallis tests.
Supplementary information
Supplementary Information
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Coughlin, S. S. Social determinants of breast cancer risk, stage, and survival. Breast Cancer Res. Treat 10.1007/s 10549-019-05340-7 (2019).10.1007/s 10549-019-05340-731270761 · doi ↗ · pubmed ↗
- 2Song, K. et al. Epsins 1 and 2 promote NEMO linear ubiquitination via LUBAC to drive breast cancer development. J. Clin. Investig.131, 10.1172/JCI 129374 (2021).10.1172/JCI 129374 PMC 777337332960814 · doi ↗ · pubmed ↗
- 3Qiagen. QIAGEN Ingenuity Pathway Analysis (QIAGEN IPA), https://digitalinsights.qiagen.com/products-overview/discovery-insights-portfolio/analysis-and-visualization/qiagen-ipa/?cmpid=CM_QDI_DISC_0822_CLC Genomics Page (2023).
