Single base focal hypermutation cooccurs with structural variation as an early event in advanced prostate tumourigenesis with ancestry specific independence: a multi-ancestral observational study
Jue Jiang, Avraam Tapinos, Ruotian Huang, M.S. Riana Bornman, Phillip Stricker, Shingai Mutambirwa, David Wedge, Weerachai Jaratlerdsiri, Vanessa Hayes

TL;DR
This study explores how focal hypermutations in prostate cancer genomes are linked to cancer progression and differ across ancestral groups, particularly highlighting African ancestry.
Contribution
The study reveals ancestry-specific patterns of kataegis in prostate cancer and its association with clinical outcomes across multiple ancestral groups.
Findings
Kataegis is significantly associated with genomic instability and poor clinical outcomes across all ancestral groups.
SV-independent and subclonal kataegis show specific associations with African ancestry.
Kataegis-positive tumors are linked to higher prostate-specific antigen levels in African patients and increased metastatic risk in European patients.
Abstract
Kataegis, the focal hypermutation of single base positions in tumour genomes, has received little attention with regards to prostate cancer (PCa) molecular features, tumour evolution and associated clinical presentation. Most notably, the impact of this phenomenon is yet to be explored across ancestral lineages representing the extremities of PCa presentation and outcomes, with men of African ancestry disproportionately disadvantaged. The purpose of this study is to address the knowledge gap through African inclusive multi-ancestral interrogation. We assessed for ancestrally shared and unique molecular, evolutionary and clinical features of kataegis in 669 multi-ancestral whole PCa genomes. Access to raw whole-genome sequenced data allowed for direct single-pipeline comparative analysis between 109 southern African and 57 European derived treatment naïve high-risk-biased primary…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7- —National Health and Medical Research Council (NHMRC) of Australia through a Project Grant, Ideas Grants
- —U.S.A. Congressionally Directed Medical Research Programs (CDMRP) Prostate Cancer Research Program (PCRP) Idea Development Award, TARGET Africa, HEROIC Consortium Award, HEROIC PCaPH Africa1K, U.S.A.
- —Petre Foundation via the University of Sydney Foundation
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Genomics and Diagnostics · Prostate Cancer Treatment and Research · Epigenetics and DNA Methylation
Introduction
Prostate cancer (PCa) is the most frequently diagnosed male cancer in most regions of the world, disproportionately affecting men of African ancestry and particularly from Sub-Saharan Africa [1]. Mortality rates from PCa are highest in Sub-Saharan Africa and the Caribbean, with southern Africa ranking the first globally at 29.7 (age-standardised rate per 100,000 males) [1]. Notably, the incidence rate of southern Africa is lower than that of economically stable regions, such as Australia and New Zealand (59.9 vs. 78.1, age-standardised rate per 100,000) [1]. Conversely, both incidence and mortality rates are lowest across the Asian diaspora of nations. While the disparities may be attributed to diminished access to PCa screening and medical resources or exposure to yet unknown geographic risk factors, studies from the United States have shown that African American men are at greatest risk for aggressive disease presentation and associated lethality after accounting for non-genetic factors [2, 3]. Additional studies that alluded to biological and genomic contributions are needed for a better understanding of the disparities across different ancestral populations.
Kataegis, meaning thunderstorm in Greek, describes the focal hypermutation phenomenon in cancer genomes [4]. A kataegis event is defined as a cluster of closely distributed single nucleotide variants (SNVs) and results from a single mutational action of APOBEC3A (A3A) or APOBEC3B (A3B) cytidine deaminases on exposed single-strand DNAs (ssDNAs) [4–6]. This mutational process has been linked to single base substitution (SBS) signatures, SBS2 and SBS13 [7]. Despite Pan-Cancer Analysis of Whole Genomes (PCAWG) and organ-specific studies suggesting kataegis to be frequent in cancers of the breast, bladder, lung, and skin (melanoma) [4, 5, 8, 9], the evolution of kataegis and its clinical implications remain elusive for PCa, and unclear for African patients due to a lack of African-derived whole tumour genome data [10, 11]. Controversially, breast cancer (BRCA) research reported kataegis with a favourable prognosis [12] and lower genomic instability [10, 12], with others showing a link to aggressive disease [7]. The early event of kataegis arising with chromothripsis during telomere crisis has been suggested by modified cell line experiments [13, 14], while late kataegis development is observed in PCa [15] and hepatocellular carcinoma [8]. However, the potential contribution and association of kataegis in PCa ancestral disparities are yet to be determined.
This study aims to characterise kataegis mutational processes in PCa genomes from patients of different ancestries and to assess the potential clinical implication, with a particular focus on aggressive disease in African men. We processed samples from 109 African men (Black South Africans) and 57 European men (predominantly Australians) through the same pipeline, providing a direct comparative analysis based on genetic ancestry. Across ancestries, our findings linked kataegis events with more aggressive PCa manifestations and adverse clinical outcomes. The investigation of the aetiology primarily attributed kataegis to APOBEC enzymes with variation between cancer aggressiveness among African patients. We observed ancestral disparities in the evolutionary timing of kataegis and the distribution of distances between kataegis and structural variants (SVs). These findings highlight the unique genetic factors contributing to PCa in African men and underscore the importance of including diverse ancestral populations in cancer research.
Materials and Methods
Subjects and whole genome sequencing data
Treatment naive samples of blood and tumour pairs were collected from 166 patients diagnosed with PCa recruited from South Africa (n = 113) and Australia (n = 53, Table. 1). Patient ancestry was determined using whole genome interrogation for subpopulation fraction analyses, as previously described [16]. In short, 109 patients categorised as African (all South African) with greater than 85% African ancestral fraction; 57 were categorised as European (53 Australian and 4 South African), allowing up to 3% African ancestral and 26% Asian contributions. Tumour aggressiveness was defined from histopathological Gleason Scores as the International Society of Urological Pathology (ISUP) Grade Group (GG) either at diagnosis (South Africans) or surgery (Australians). Patients presented either as low-risk (LR, GG1 and GG2) or high-risk (HR, GG3–5), with the African derived HR group biased towards very HR PCa (89%, 72/81 ISUP GG4/5). For comparison, we intentionally selected untreated biobanked samples with advanced disease for our European cohort (98%, 49/50 ISUP GG 4/5). As previously reported for South-East Africa [17], both prostate specific antigen (PSA) levels (median 82.60 vs 8.15) and age at presentation/surgery (median 69 vs 63 in HR groups) are elevated for our African over the European cohort of HR groups. The latter cohort allows for extensive follow-up data defined as biochemical relapse (BCR) and/or metastasis. All samples underwent deep WGS using the Illumina NovaSeq and Hiseq platforms (median coverages tumour 88.64 X and blood 44.19X), GRCh38 referenced variant calling and annotation, and evolutionary timing pipelines, as previously described [16, 18].
Public validation cohorts
Somatic SNVs were downloaded from published deep WGS primary tumour-normal data derived from 296 European and 207 Asian PCa donors, with available clinical data (Table. 1). European data were derived from the Prostate Adenocarcinoma Canada project via the International Cancer Genome Consortium (ICGC) Data Portal [19, 20]. Asian data were obtained from the Chinese Prostate Cancer Genome and Epigenome Atlas (CPGEA) with accession number PRJCA001124 [21]. The European data are biased towards the LR PCa, with no age differences between LR and HR cases for either European data (79%, n = 234 vs. 21%, n = 62; median of age, 64 vs. 63.5 years; Wilcoxon’s rank sum test, P = 0.58) or Asian data (35%, n = 73 vs. 65%, n = 134; the same median of age at 69 years).
Kataegis identification and evolution
Kataegis identification followed the methods of the PCAWG study, using an adjusted threshold for candidate calling, followed by two criteria (detailed in Additional file1: Supplementary methods) [5]. Briefly, inter-mutational distances of SNVs were adjusted with the piecewise constant fitting (PCF) model using the core algorithms of the kataegis package [22] with default parameters [9]. The threshold, requiring a minimum of four SNVs with the PCF-adjusted distances less than one kb, was set and derived from the total number of SNVs per patient and identical for all patients.
Kataegis events were further refined with evolutionary timing (detailed in Additional file1: Supplementary methods). As kataegis SNVs arise together from a single mutational process [5], we refined kataegis with evolution by examining each subset of SNVs that occurred during the same evolutionary epochs, including clonal (early, late, and unspecified) and subclonal epochs. This step was applied only to the current study cohort and identified a total of 249 evolutionary kataegis events in 65 patients. Evolutionary kataegis was unavailable for public cohorts due to the lack of available copy number variants (CNVs).
Statistical Analysis
Statistical tests included Fisher’s exact test for categorical variables using the stats package [23], and Wilcoxon’s rank sum test for continuous data comparisons between two ancestries or risk groups using the ggpubr package (v0.6.0) [24] in R (v 4.2.2) [23]. Ps of multiple hypothesis testing were adjusted using the false discovery rate (FDR) when specified. Four outliers with extreme kataegis burdens were excluded, including one European patient (42 kataegis events) in the study cohort, and three patients whose z-scores were greater than three in the public European cohort.
For genomic features significantly associated with the presence of kataegis, we further analysed their associations with kataegis burden with a negative binomial regression model. The negative binomial regression model was suitable to describe the kataegis burden that had many zero values and a variance greater than its mean (4.03 vs. 1.03). The analysis excluded the aforementioned outlier, and three African patients with PSA or age unavailable. Besides all the genomic features associated with kataegis, the analysis also included ancestry, patient risk levels and age at diagnosis. Log-transformation was applied to adjust data skewness found in SV burden, tumour mutational burden (TMB), chromothripsis burden, percentage of genome alteration (PGA), and copy number (CN) gain.
Co-occurrence of kataegis with cancer driver mutations
Co-occurrence of kataegis with cancer driver mutations
We examined associations between kataegis and point mutations of 58 selected genes using Fisher’s exact test (P-values and FDRs in Additional file2: Table S1). We examined previously reported top cancer drivers for PCa [16, 25] and/or genes potentially related to kataegis development, such as cell-cycle checkpoint-related genes [26], APOBEC3A, and APOBEC3B.
Survival analysis
We performed survival analyses using Kaplan-Meier estimates from the survival package (v 3.5–5) [27] and log-rank tests from the survminer package (v 0.4.9) [28]. To assess clinical progression, we compared (i) patients with BCR and/or metastasis to those with neither, and (ii) patients with metastasis only to those without metastasis or BCR. The survival distribution was compared by kataegis state (positive or negative), and by kataegis burden (elevated burden with a kataegis count > 1 or ≤1). The analysis was performed for LR and HR groups concurrently and separately for the European patients with available follow-up data from our study cohort, and for public European and Asian cohorts. From our study cohort, we excluded the small LR group of European patients (n = 7), a hyper-kataegic outlier, and three patients not curative after radical prostatectomy from the HR group (n = 42 remaining). From the validation cohorts, we excluded three outliers defined by z-scores greater than three, and patients with missing clinical follow-up from the public European cohort (n = 281 remaining). We also filtered out 21 patients with missing clinical information from the Asian cohort (n = 186 remaining).
SBS and SV signatures
Kataegic SNVs, genome-wide SBS, and SV signatures were decomposed and assigned using SigProfilerExtractor (v.1.1.24) [29]. The analysis processed kataegic SNVs from 283 kataegis positive tumours from this study and validation cohorts. The aforementioned outliers, one from the study cohort and three from the public European cohort, were excluded from the analysis. Kataegic SNVs from the public European data were lifted to GRCh38 reference using liftOver (last modified 2022–01-31) [30]. The signature identification steps included de novo signature discovery using nonnegative matrix factorisation (NMF) and the assignment of conventional Catalogue Of Somatic Mutations In Cancer (COSMIC) signatures (v3.4, Oct. 2023). We used default settings with some modifications, including a maximum of 15 signatures, 500 NMF replicates, one million maximal NMF iterations, and the GRCh38 reference. The assignment of SBS signatures was challenging for kataegic SNVs due to a small number of SNVs compared to genome-wide SNVs. To maintain the accuracy, 33 samples were filtered out from a cut-off of cosine similarity greater than 0.5. The passed samples had a median cosine similarity of 0.851 (range, 0.508–0.988). In addition, genome-wide SBS and SV signatures were identified from 165 samples, excluding a European outlier. The SigProfilerExtractor parameters and version of the COSMIC database were the same as those used for the kataegic SBS signatures. The NMF extraction methods were based on the frequency matrix of 32 SV types [31].
APOBEC attribution to kataegis
We used Fisher’s exact test to identify APOBEC-enriched kataegis, which were further tested for A3A or A3B enrichment according to the context preference of APOBEC enzymes. The identification mainly followed the method previously used for genome-wide enrichment [32]. For the APOBEC enrichment, kataegis events were compared with other non-clustering SNVs from the sample for the count of mutated cytosines in each motif (C and TCW) adjusted by the accessible rate of the motif (20 bp context). Here, we used TCW to represent the APOBEC enzyme preference motif, as observing comparable amounts of cytosine mutations in TCA and TCT, rather than a skewness toward TCA reported previously [32]. We used TCW to represent a cytosine mutation in the TCW motif, and more details of the Fisher’s exact test are in Additional file1: Supplementary methods. Further, for each APOBEC-enriched kataegis, we identified A3A-enriched kataegis with YTCW motif and A3B-enriched kataegis with RTCW motif [32, 33], where the underlined cytosine means mutated. P-values were adjusted with FDR.
Distribution of kataegis and proximal SVs
The enrichment or sparsity of SVs proximal to kataegis events was tested by comparing kataegis with simulated kataegis events. For each kataegis event (n = 831) identified in this study and validation cohorts, excluding four outliers, we simulated 1,000 pseudo kataegis events with the same event interval by randomly assigning the central position with 1,000 non-clustering SNVs from the sample. For both identified kataegis and simulated kataegis, their distances to proximal SVs were compared using log-spaced bins (0–1 kb, 1 kb – 10 kb, 10kb – 0.1 Mb, 0.1 Mb – 1 Mb, 1 Mb – 10 Mb, 10 Mb – 100 Mb, and beyond 100 Mb). For each patient group defined by ancestry and risk level, we tested enrichment or sparsity of SVs by calculating P-values based on the rank of the identified kataegis in the 1,000 simulated kataegis events. P-values were adjusted with FDR.
Results
Ancestrally independent low prevalence and burden for prostate tumour kataegis
From the study cohort including 113 Africans from South Africa, 53 Europeans from Australia, and validation cohorts 296 Europeans from Canadian, 207 Asian from China (Table 1), we identified kataegis with TMB-derived threshold and criteria based on known kataegis characteristics [5]. For the study cohort, 260 kataegis events were identified in 41% (68/166) of tumours (Fig. 1A, Additional file2: Table S2), consistent with a previous report for European patients [5]. Within the validation cohorts, we identified 321 kataegis events in 39.2% (116/296) of European and 297 events in 49.8% (103/207) of Asian patients (Additional file2: Table S3, S4). Overall, we observed a low kataegis burden (median: two events, range: 1 to 13, Fig. 1B), excluding one hyper-kataegic outlier (47 events) derived from a single European patient. The median number of SNVs of a kataegis event is six, spanning a narrow range of 2.67 kb and differing between HR groups by ancestry (African 5 SNVs vs. European 7 SNVs, Wilcoxon’s rank-sum test, FDR = 5 × 10^−4^). Kataegic regions were unique to each patient, as previously described [34], and only a few were within functional genomic regions (Additional file2: Table S5).
Kataegis is associated with genomic instability and cooccurs with cancer drivers
Kataegis-positive tumours exhibited increased genomic instability marked by various genomic features observed in one or more groups of risk levels and genetic ancestries with Wilcoxon rank sum test (Additional file1: Fig. S1). Elevated TMB, SVs, and chromothripsis were observed in kataegis-positives across ancestries and cancer aggressiveness (FDRs = 3 × 10^−5^–0.02, 7 × 10^−6^–0.002, and 2 × 10^−5^–0.002, respectively). Notably, the SV burden was the most significant factor, showing a further association with kataegis burden in both ancestral groups (negative binomial model, P = 0.001), consistent with the previous study of European patients [5]. CNVs were significantly correlated with kataegis exclusive to HR groups of African and European patients, characterised by gains (FDRs = 0.01–0.04), losses (FDRs = 2 × 10^−4^–0.002) or both as measured by PGA (FDRs = 2 × 10^−4^–0.002). Significantly shorter telomere lengths were observed in kataegis-positive tumours derived from African patients within the LR group (FDR = 0.03).
Significant co-occurrences of kataegis and cancer driver point mutations were observed using Fisher’s exact test (Additional file2: Table S1). The significant co-occurrence of kataegis and RBFOX1 in HR groups was found for both ancestries (FDRs = 7 × 10^−4^–0.003) and validated by the LR group of public European patients (n = 234, FDR = 3 × 10^−6^). Additionally, significant on-occurrence of kataegis with PDE4D, TP53, and ZFHX3 were observed in the HR group of European patients of the study cohort (FDRs = 5 × 10^−4^, 0.04, 0.005, respectively), as well as ATM, ATRX, and CHEK2 observed the LR group of the public European patients (FDRs = 0.002, 0.03, and 0.03, respectively). However, no significant co-occurrence was found in the public Asian cohort (n = 207, FDR >0.3).
Kataegis correlates with adverse PCa clinical outcomes
To study the clinical implication of kataegis, we examined the PSA level of patients, a widely used clinical measurement for PCa detection [35] and post-treatment recurrence [36]. Higher PSA levels were observed with kataegis positive tumours compared to those with negative tumours in the HR group of African patients (median: 100 vs. 43.0 ng/mL; Wilcoxon’s rank-sum test FDR = 0.002; Fig. 2A). Appreciating that high PSA levels may be an indicator of metastasis risk [35], lack of associated clinical follow-up data, including associated magnetic resonance imaging (MRI) data, limited further investigation. For the HR group of our European patients sampled at surgery, neither prevalence nor burden of kataegis was a significant predictor of BCR (Kaplan-Meier test), likely due to their small cohort size. Leveraging a larger LR-biased European PCa data resource, we showed LR patients with elevated kataegis burden (more than one event) to be significantly susceptible to metastasis (Log-rank test, P = 0.03), while observing no association for BCR (Fig. 2B).
APOBEC3B is the main aetiology for kataegis in prostate tumours
Our analysis of kataegis aetiology identified APOBEC as the primary contributing factor to kataegis across ancestries, except for the LR group of African patients. Particularly, SBS2 and SBS13 signatures accounted for approximately 80% (median: 79.1–93.6% for eight subgroups; Fig. 3A; all SBS proportions in Additional file1: Fig. S2), consistent with previous report (81.7%) [5]. Consistently, more than 50% (51.6–71.8%) of kataegis were APOBEC-enriched identified based on the motif preferences (Fig. 3B). Between ancestries, the LR group of Asian patients exhibited significantly more APOBEC-enriched kataegis than other ancestries (Fisher’s exact test, P = 0.02 and 0.04 for LR group of African and public European data, respectively). Between risk-levels, the LR group of African patients showed significantly less APOBEC-enriched kataegis than the HR group (Fisher’s exact test, P = 0.04).
After observing APOBEC as the main contribution of kataegis events, we conducted a focused comparison between the two APOBEC-related signatures SBS2 and SBS13. The predominance of SBS13 (median, 40–62% for eight subgroups) over SBS2 was observed with significance in the HR group of African patients, and in both LR and HR groups of public European and Asian data (Wilcoxon’s rank sum test, FDR = 4 × 10^−10^–0.005; Fig. 3A). Different from the other groups, the LR group of African patients showed the lowest proportion of APOBEC-related SBS2, significantly lower than the HR group (median, 0% vs. 25.4%; Wilcoxon’s rank sum test, P = 0.048).
We further attributed kataegis to the two APOBEC enzymes A3A and A3B. We observed higher proportions of A3B enrichment in all groups except for the LR group of African patients, with significance observed for larger public European LR and Asian LR/HR data (Wilcoxson’s rank sum test, FDR = 1 × 10^−4^–0.02; Fig. 3B). This differs from the previous observation in hypermutated samples where A3A was strongly associated [32], probably because our samples exhibit lower levels of APOBEC activity. This argument is supported by the observation that APOBEC-related signatures were exclusively within kataegic SNVs and not from genome-wide SNVs (Additional file1: Fig. S3). Also, our PCa patients showed no APOBEC3A and APOBEC3B germline predispositions, as previously reported in other cancers [5, 37, 38], including rs12628403 [5], rs1014971 [38]. Kataegis status was not associated with somatic CNVs in APOBEC3A and APOBEC3B genes and regions within and between the genes. These findings align with the low frequency and burden of kataegis observed in PCa.
Genomic rearrangement processes of ancestry predominant kataegis
Having observed a close association between SV and kataegis abundance, we sought to further determine their distributions across the tumour genome. Kataegis events observed across ancestries and risk levels were significantly enriched around SV breakpoints, with 50% (413/831) within a 10 kb distance, 40% (335/831) within a 1-kb distance, and 13% spanning across SV breakpoints (109/831). Comparing kataegis to simulated kataegis events (1,000 times) with randomly selected non-clustering SNVs, we defined the ranges where kataegis were significantly enriched or sparse from an SV. Kataegis were significantly enriched around SV regions with varying ranges (0–10 kb to 0–1 Mb) and sparse at distances beyond 10 Mb or 100 Mb between groups of ancestries and risk levels (simulation tests on log-spaced bins, FDR = 0.003–0.01; Fig. 4, Additional file1: Fig. S4). We categorised kataegis to be SV-associated and independent for events located within enriched and sparse regions, respectively. The two types of kataegis varied in proportions between risk levels and between African and European ancestries. More SV-associated kataegis was observed in HR over LR groups (Fisher’s exact test, public European data, FDR = 0.004), and in the HR groups of European over African patients (Fisher’s exact one-way test, P = 0.03). Focusing on SV types, chromothripsis was significantly enriched around kataegis (Fisher’s exact test, FDR = 0.04; Fig. 4), aligned with previous findings [6]. Conversely, kataegis did not occur close to translocations (FDR = 1 × 10^−4^) as shown in the European public data analysed in this study (Additional file1: Fig. S5, S6).
The analysis of genome-wide SV signatures for HR groups of the study cohort revealed an association between translocation SV type and kataegis. Compared to kataegis-negative tumours, kataegis-positives from both ancestries exhibited significantly lower proportion and less presence of the predominant SV2 signature and higher proportions and/or more presences of SV4 and SV10 (Fisher’s one-way exact test, FDR = 1 × 10^−4^–0.01; Wilcoxon’s rank sum test, FDR = 1 × 10^−3^– 9 × 10^−3^; Fig. 5; all SV signatures identified in Additional file1: Fig. S7). According to the COSMIC SV signature database [31], simple translocations and clustered translocations are the primary components of SV2 and SV4, respectively, while SV10 encompass simple rearrangements of other types. These suggest kataegis-positive prostate tumours characterised by an increase in clustered translocations alongside non-clustered SVs of other types.
Differential evolution of prostate tumour kataegis events between ancestries
We revealed the uneven rise of kataegis across different evolutionary timeframes by assigning kataegis to clonal epochs (early, late, and unspecified) and the subclonal epoch for the study cohort (Additional file1: Supplementary methods). Both ancestries showed a bias towards clonal origins (65.0% clonal kataegis, 128/197; Fig. 6). The clonal proportion of kataegis was significantly higher than that of genome-wide SNVs (median, 100% vs 68.3%; paired Wilcoxon’s rank sum test, P = 0.01), aligning with the clonal origin of chromothripsis [5] that could arise along with kataegis during telomere crisis [14]. This clonal bias of kataegis appears to be unreported in previous PCa studies [5, 15], while subclonal bias was reported for cancers with high kataegis burdens, excluding PCa [5]. Between ancestries, early clonal kataegis events were more frequent in European patients studied (EUR = 17.2% vs. AFR = 6.9%, Fisher’s exact test on HR groups, P = 0.04). In contrast, African-derived tumours exhibited an increased proportion of subclonal kataegis in both LR and HR groups; the latter showed significance when compared to the European patients (19%, Fisher’s exact test on HR groups, P = 0.002). These findings suggest ancestral specific dynamics during carcinogenesis.
Discussion
Kataegis is largely overlooked in PCa research due to its low frequency and burden compared to other cancer types, such as bladder and lung cancer [5]. To the best of our knowledge, no study has investigated the potential contribution of kataegis to ancestrally associated PCa health disparities. Here, using a unique multi-ancestral PCa resource, including southern African men representing the highest global region for PCa-associated mortality, complemented with published data [19–21], we present a detailed characterisation of kataegis features in prostate tumours, highlighting its implications in worse clinical outcomes and ancestrally different mutational processes. We observed prostate tumours exhibiting kataegis, often accompanied by cancer driver mutations and elevated genomic instability, are linked to adverse clinical outcomes. Tumours derived from African patients exhibited a higher proportion of kataegis independent of SVs and later occurrence in subclones. Among African patients, the proportion of kataegis attributed to APOBEC varied between cancer risks. These findings refine the earlier findings of ancestry-related cancer progression trajectories [16] by emphasising disparities in hypermutations, further underscoring the importance of African-inclusive investigations.
Furthermore, we propose kataegis as an indicator of adverse PCa which is independent of both ancestry and risk level. The similar prevalence of kataegis between risk levels highlights the limitation of current cancer grading which failed to detect any morphological or physical changes resulting from the interplay of kataegis, cancer drivers and genomic instability. Patients with kataegis-positive tumours may be recommended for more frequent monitoring during the remission period, as with a potentially higher metastatic risk. The heightened metastatic risk may be driven by genomic instability [39] and the two co-occurrent oncogenes of kataegis, RBFOX1 and TP53 [40, 41]. Also, PSA levels, known to be implicated in bone metastasis via the stimulation of osteoprotegerin [42], are found to rise in African patients with kataegis-positive aggressive PCa. However, our observation has challenged a previous statement that kataegis is a marker of good prognosis for BRCA [12]. The BRCA study showed significantly shorter survival time for patients with kataegis, but proposed that aging might be the driving force. Therefore, further follow-up data from African patients is required for investigation, ideally in a large cohort to exclude potential confounding by age. Besides, we propose that the implication of kataegis in prostate tumours progression is different from BRCA and other cancer types with high kataegis burden, despite sharing features including elevated genomic instability, close association with SVs, and attribution to APOBEC enzyme activity [5, 11]. Specific to prostate tumours, only the clustered mutations are attributed to APOBEC enzyme activity, mostly to A3B, and no germline predisposition effects that previously reported in BRCA [37] have been identified through variation in A3A and A3B genes.
Our African-inclusive study design has revealed ancestral disparities in kataegis development through evolutionary timing and mutational processes. Our evolutionary analyses have shown clonal kataegis predominated irrespective of patient ancestry. In particular, European ancestry has exhibited the high proportion of early clonal kataegis, indicating an implication in cancer initiation. In contrast, the subclonal kataegis identified in this study are notably biased towards African patients, regardless of clinicopathological presentation, suggesting a high level of genomic instability in cancer and, therefore, marked tumour heterogeneity and associated chemoresistance [43]. However, we acknowledge that our computational estimation of subclonal kataegis is a simplified model, further investigation with more sequencing techniques may help discern subclones and multiclonal origins, the prevalence of which is unknown to African patients.
Additionally, we describe two kataegis mutational mechanisms as SV-associated and independent, observing varying proportions by ancestry with the former significantly more frequent in European than African patients. While kataegis are attributed to APOBEC deamination of cytosines from exposed ssDNA, mostly to APOBEC3B observed in this study, the deamination may take place under different processes for the two kataegis types (Fig. 7). We speculate that SV-associated kataegis could have arisen during telomere crisis [14] and double-strand breaks (DSBs) repair mechanisms, such as break-induced replication (BIR) [44–46], as well as non-homologous end-joining (NHEJ) and alternative end-joining (A-EJ) concerning to the close association with chromothripsis [47]. The concurrence of driver mutations in TP53, a cell-cycle checkpoint gene, observed in European patients from this study supports the hypothesis that telomere crisis may result in chromothripsis-associated kataegis bypassing a cell cycle checkpoint due to checkpoint deficiency [14]. Also, significantly shorter telomere length has been observed in kataegis-positive tumours derived from the low-risk group of African patients and has been previously reported for the aggressive tumours derived from African men [48]. Conversely, we hypothesise that SV-independent kataegis, which we found to be more common in African ancestrally derived tumours, may arise on R loops in transcription bubbles or on the lagging strand of the DNA replication fork [49, 50]. The transcription and replication may be interplayed as R-loops in one of the sources that increase replication stress, leading to an elevated exposure of ssDNA at the replication fork [51]. We acknowledge, however, that our proposed hypotheses require further validation of cell experiments, such as DNA/RNA immunoprecipitation sequencing (DRIP)–R-loop experiments. Altogether, these findings suggest divergent tumour pathways to some extent between ancestries.
While this study provides novel insights into kataegis in relation to ancestries and cancer aggressiveness, several limitations must be acknowledged. The lack of relevant data has hindered further validation or investigation, although this has been mitigated by integrating public cohorts. The clinical implications of kataegis for African patients need future research due to a lack of African follow-up and validation data. More LR data derived from African patients are also required for differentiating features between cancer aggressiveness. To scrutinise the ancestrally shared and distinctive features of kataegis, we integrated publicly available PCa data from European and Asian ancestral patients. However, this study and public cohorts differ in their composition of cancer aggressiveness and variant identification pipelines. While our study cohort is biased towards very HR disease (ISUP GG4/5), the public European dataset focused on intermediate risk disease (82%, ISUP GG2/3) [20], and the public Asian data lacks ISUP GG 5 (0.5%, 1/207) [21]. Additionally, although we applied consistent methods for kataegis identification and downstream analyses, our somatic variant identification is more stringent due to filtering by a panel of normal samples. These limitations highlight not only areas for future research, but importantly underscores the need for tailored data collection and analysis.
Conclusions
The available PCa whole genome cohort remains one of the largest of its kind for the African continent and benefits from the inclusion of clinically, technically, and analytically matched non-African data, allowing for direct, unbiased comparative analyses. Using this African inclusive resource, supported by published non-African data, enabled us to discern both universal (or shared) and ancestrally unique kataegis positive prostate tumour features, particularly with regards to advanced disease. Demonstrating heightened African-specific kataegis-associated heterogeneity, our study emphasises the need for further African inclusion, specifically to elucidate the potential of kataegis and APOBEC3 enzymes as biomarkers of targeted cancer therapy. Collectively, by elucidating the occurrence of kataegis from tumorigenesis to later subclonal stage in African and European patients, we highlight the significance of different underlying mutational processes between ancestries, which provides a valuable resource for targeted therapeutic interventions and emphasises the need for continued exploration of biological behaviours and environmental exposures in African patients.
Supplementary Material
Supplementary Files
This is a list of supplementary fi les associated with this preprint. Click to download.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bray F., , Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin, 2024. 74(3): p. 229–263.38572751 10.3322/caac.21834 · doi ↗ · pubmed ↗
- 2Lee K.M., , Association between prediagnostic prostate-specific antigen and prostate cancer probability in Black and non-Hispanic White men. Cancer, 2024. 130(2): p. 224–231.37927109 10.1002/cncr.34979 · doi ↗ · pubmed ↗
- 3Nair S.S., , Why do African-American men face higher risks for lethal prostate cancer? Curr Opin Urol, 2022. 32(1): p. 96–101.34798639 10.1097/MOU.0000000000000951 PMC 8635247 · doi ↗ · pubmed ↗
- 4Nik-Zainal S., , The life history of 21 breast cancers. Cell, 2012. 149(5): p. 994–1007.22608083 10.1016/j.cell.2012.04.023PMC 3428864 · doi ↗ · pubmed ↗
- 5Aaltonen L.A., , Pan-cancer analysis of whole genomes. Nature, 2020. 578(7793): p. 82–93.32025007 10.1038/s 41586-020-1969-6PMC 7025898 · doi ↗ · pubmed ↗
- 6Taylor B.J., , DNA deaminases induce break-associated mutation showers with implication of APOBEC 3B and 3A in breast cancer kataegis. elife, 2013. 2: p. e 00534.23599896 10.7554/e Life.00534 PMC 3628087 · doi ↗ · pubmed ↗
- 7Veerla S. and Staaf J., Kataegis in clinical and molecular subgroups of primary breast cancer. npj Breast Cancer, 2024. 10(1): p. 32.38658600 10.1038/s 41523-024-00640-8PMC 11043427 · doi ↗ · pubmed ↗
- 8Chen L., , Deep whole-genome analysis of 494 hepatocellular carcinomas. Nature, 2024. 627(8004): p. 586–593.38355797 10.1038/s 41586-024-07054-3 · doi ↗ · pubmed ↗
