Risk stratification for endometrial cancer: independent and joint effects of polygenic risk score and body mass index in 129,829 UK Biobank participants
Xuemin Wang, Laure Dossus, Marc J. Gunter, Emma J. Davidson, Jue-Sheng Ong, Dylan M. Glubb, Tracy A. O’Mara

TL;DR
This study shows that combining genetic risk scores with BMI improves endometrial cancer risk prediction, identifying high-risk individuals even among those with normal weight.
Contribution
The study introduces a novel approach integrating polygenic risk scores with BMI for endometrial cancer risk stratification.
Findings
The integrated model with PRS and epidemiological factors improved prediction (AUC 0.739 vs. 0.728).
Top 1% PRS individuals had a 3.06-fold increased risk with a number needed to screen of 58.
PRS and BMI independently contribute to risk, with highest risk in high BMI and top PRS tertile (HR 4.94).
Abstract
Although obesity is a well-established risk factor for endometrial cancer, its relationship with genetic susceptibility in determining cancer risk remains unexplored. Current endometrial cancer risk prediction relies primarily on epidemiological factors, with limited consideration of genetic risk. We hypothesized that integrating polygenic risk score (PRS) information with established epidemiological factors could improve risk stratification and reveal whether genetic and lifestyle factors operate independently or jointly. We generated a polygenic risk score for endometrial cancer in 129,829 unrelated female participants of European genetic ancestry (including 956 incident cases with endometrial cancer) in the UK Biobank cohort. We evaluated the prediction model performance using area under the receiver operating characteristic curves (AUCs) and assessed individual and joint…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3- —https://doi.org/10.13039/501100007287Worldwide Cancer Research
- —https://doi.org/10.13039/501100001111Cancer Australia
- —https://doi.org/10.13039/501100000272National Institute for Health and Care Research
- —https://doi.org/10.13039/100014653Manchester Biomedical Research Centre
- —https://doi.org/10.13039/501100000925National Health and Medical Research Council
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEndometrial and Cervical Cancer Treatments · Gynecological conditions and treatments · Genetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities
Background
Endometrial cancer is the most common gynecological cancer in developed countries, with 420,242 new cases and 97,704 new deaths estimated globally in 2022 [1]. Notably, over the last three decades, the incidence and mortality of endometrial cancer has been increasing worldwide [2].
Previous studies have used epidemiological risk factors, including age, BMI, parity, duration of oral contraceptive use, age at menarche, and age at menopause, to predict endometrial cancer status at moderate accuracy, with AUC values varying from 0.61 to 0.77 depending on the risk factors included and specific study populations [3–6]. Polygenic risk score (PRS) approaches, the cumulative dosage effects of multiple genetic variants identified by genome-wide association studies (GWAS), hold promise for disease stratification [7]. Incorporating polygenic risk scores into epidemiological models have achieved marginal improvement in the prediction of endometrial cancer development [3, 5, 6]. However, those studies have only included genome-wide significant or sub-genome-wide significant variants into the construction of the endometrial cancer PRS.
Obesity, typically measured by BMI, is the strongest known modifiable risk factor for endometrial cancer [8]. With the rising global prevalence of obesity [9], the incidence of endometrial cancer is further expected to increase. In addition to obesity and genetic variation, factors such as age, ages at menopause and menarche, parity and endogenous sex hormone levels also affect endometrial cancer risk [8, 10–19]. However, current endometrial cancer risk prediction relies primarily on epidemiological factors, with limited consideration of genetic risk. Integrating polygenic risk information with established epidemiological factors could substantially improve risk stratification and reveal whether genetic and lifestyle factors operate independently or jointly. Furthermore, identifying high-risk individuals beyond those screened for Lynch Syndrome could enable targeted prevention and screening strategies in clinical practice. This study addresses these gaps by as follows: (1) developing an integrated prediction model incorporating both PRS and epidemiological risk factors, (2) quantifying the independent and joint contributions of genetic risk and BMI to endometrial cancer development, and (3) evaluating whether genetic risk stratification could complement current clinical screening approaches in the UK Biobank cohort.
Methods
Study population
A flowchart outlining the study design and population is presented in Fig. 1. This cohort study was based on data from the UK Biobank, which is a prospective cohort with extensive phenotypic and genotypic data for over 500,000 UK participants aged 40–70 at enrolment. Details of the UK Biobank can be found in Bycroft et al. [20, 21]. Briefly, participants were genotyped using either UK BiLEVE Axiom Array (807,411 genetic variants) or UK Biobank Axiom Array (825,927 genetic variants). Genetic variants were imputed using the 1000 Genomes phase 3, the UK10K, and the Haplotype Reference Consortium datasets as the imputation reference panels, which resulted in 93,095,623 autosomal single nucleotide polymorphisms (SNPs), short indels and large structural variants and 3,963,705 variants on the X chromosome.Fig. 1. Flowchart for the selection of study participants in the UK Biobank and directed acyclic graph of the study design. (A) Flowchart of inclusion and exclusion criteria. (B) Flowchart of the generation of endometrial cancer polygenic risk scores (PRSs) and evaluation of its prediction of endometrial cancer status. (C) The directed acyclic graph for assessment of endometrial cancer incidence in different BMI and endometrial cancer PRS groups. QCs included filtered out SNPs with MAF < 0.01, multi-allelic variants, missing genotype rate in > 10% of samples, departing from Hardy–Weinberg Equilibrium (P < × 10^−6^), and low imputation quality (< 0.4). ^*^4475 females (26 cases and 4449 non-cases) within missing values of BMI, age at menarche, number of live births, or ever taken OC pill were excluded
Selection of endometrial cancer cases and cancer-free participants
Following the exclusion of withdrawn participants, we restricted analyses to unrelated 181,201 female UK Biobank participants of European genetic ancestry, defined using both self-reported “White British” ancestry and genetic principal component clustering (UK Biobank Data Fields 22,006 and 22,020) created by Bycroft et al. [20] (Fig. 1A). While this approach reduces residual population structure and improves internal validity, it does not capture the full spectrum of human genetic diversity. In accordance with National Academies Science Engineering Medicine (NASEM) guidelines [22], we emphasize that the “White British” descriptor reflects a socially constructed category linked to the specific reference datasets used and should not be interpreted as a discrete or genetically homogenous population group. The use of this descriptor and principal component-based selection inherently limits the generalizability of our findings to other populations. Validation and, if necessary, recalibration of risk models in ancestrally and culturally diverse cohorts will be essential to ensure equitable and accurate risk prediction.
A total of 1867 endometrial cancer cases were identified among the 181,201 unrelated females of European genetic ancestry using the International Classification of Disease 10 (ICD10) subcategory C54.1 (malignant neoplasm of corpus uteri, Endometrium) in UK Biobank Data Fields 40,001 (underlying (primary) cause of death), 40,002 (contributory (secondary) causes of death), 40,006 (Type of cancer), 41,202 (diagnoses–main), 41,204 (diagnoses–secondary), and 41,270 (diagnoses). Diagnosis date was defined as the time of the first endometrial cancer record. From the unrelated females of European genetic ancestry, 133,322 females who had an intact uterus (no hysterectomy) and no prior cancer diagnosis (except non-melanoma skin) were included as controls. Prior cancers were identified based on ICD9 (40,013, 41,203, and 41,271); ICD10 (40,001, 40,002, 40,006, 41,202, 41,204 and 41,270); and self-reported cancers (20,001). We restricted analyses to 982 incident endometrial cancer cases. This included the removal of 566 prevalent cases who were diagnosed before recruitment, 240 cases without known date of diagnosis, and 79 cases who were diagnosed within 12 months to mitigate issues related to delayed diagnosis or delayed linkage to cancer registry.
To evaluate endometrial cancer risk prediction models (Fig. 1B, C), we additionally excluded 4475 females (26 cases and 4449 non-cases) with missing values of BMI, age at menarche, number of live births, or ever taken oral contraceptive (OC) pill, resulting in 956 cases and 128,873 cancer-free cohort participants.
Generation of endometrial cancer PRS
GWAS summary statistics on endometrial cancer risk were sourced from the latest Endometrial Cancer Association Consortium (ECAC) GWAS analysis (12,906 cases and 108,979 controls), which included 636 endometrial cancer cases and 62,853 cancer-free female controls from the UK Biobank [13]. To avoid potential bias due to sample overlap between the GWAS dataset and the PRS validation dataset, the ECAC GWAS summary statistics were derived again by excluding UK Biobank samples, resulting in 12,270 endometrial cancer cases and 46,126 controls (Fig. 1B) [23].
All GWAS variants were directly genotyped or well imputed (imputation score > 0.4) and had a minor allele frequency (MAF) > 1%. Posterior effect sizes for genome-wide variants were derived using the SBayesR [24] method implemented in genome-wide complex trait Bayesian (GCTB) software [25]. The linkage disequilibrium (LD) matrix was computed based on 1.1 million HapMap 3 variants using a banded matrix with a window size of 3 cM per SNP in a random sample of 50,000 unrelated UK Biobank samples (https://cnsgenomics.com/software/gctb/#LDmatrices).
For UK Biobank individual-level genotypic data, standard GWAS quality controls were conducted to select genetic variants and samples for endometrial cancer PRS generation following the guidelines outlined by Choi et al. [26] (see https://choishingwan.github.io/PRS-Tutorial/). Briefly, genetic variants with a MAF < 0.01, a missing genotype rate exceeding 1%, or departing from Hardy–Weinberg Equilibrium (P < 1 × 10^−10^), were excluded, leaving 8,947,018 variants available for potential construction of the endometrial cancer risk PRS. All samples had no more than 10% missing genotypes. Per-individual PRS was calculated as the genome-wide sum of the per-variant posterior effect size multiplied by allele dosage using PLINK (https://www.cog-genomics.org/plink/) [27]. PRS values for individual females were centered to zero.
Selection of established endometrial cancer risk factors
Drawing on insights from epidemiological and Mendelian randomization studies of endometrial cancer [11, 18, 28, 29], we selected and incorporated seven key risk factors into our prediction models for endometrial cancer status (case or non-case). The selected risk factors were as follows: measured BMI (UK Biobank data field 21,001), reported age at menarche (2714), age at natural menopause (3581), number of live births (2734), ever taken OC pill (2784), and levels of sex hormone binding globulin (SHBG; 30,830) and testosterone (30,850) during the initial assessment visit. The proportion of participants with missing information for variables was very low for BMI, age at menarche, number of live births, and ever taken OC pill (missingness range 0.3%–2.9%). The proportion of missing was greater for SHBG (14.3%) and testosterone levels (19.5%) likely due to issues associated with biospecimen processing, including sample quantity and quality failures. Age at menopause was missing for 3.4% of participants and not available for 42.6% of participants because they had not reached menopause.
In order to generate risk models that would be useful pre-menopausal, we generated a polygenic score (PGS) for age at menopause using the GWAS summary statistics provided by Ruth et al. (2021) [17, 30] and the same procedure as for the derivation for endometrial cancer PRS described above. We used age at menopause PGS in endometrial cancer risk prediction and in the analysis of associations of endometrial cancer with BMI and PRS. We similarly leveraged PGS methods for SHBG and testosterone levels. The use of PGS for these factors in risk prediction models will allow for replication in other cohorts that typically do not have these measured. PGS for levels of SHBG and testosterone were generated using the female-stratified GWAS summary statistics provided by Ruth et al. [29, 31, 32] and the same procedure as for the derivation for endometrial cancer PRS. The resulting PGS values for SHBG and testosterone instead of their measured values were included as covariates to assess the performance of our prediction models and in the analysis of associations of endometrial cancer with BMI and PRS. To address potential circularity or bias from including PGS derived from partially overlapping datasets, we performed a sensitivity analysis repeating the main analysis using measured SHBG and testosterone levels and reported age at menopause in place of their corresponding PGS. These sensitivity models were constructed within the same UK Biobank cohort (557 incidence cases and 50,303 controls with all measurements available) to evaluate whether the inclusion of genetically proxied traits materially affected associations or prediction performance.
Age at initial assessment (UK Biobank data field 21,003) and the top 10 genetic principal components (PCs) were included in all prediction models as covariates. To estimate genetic PCs, a genetic relationship matrix among individuals was created using HapMap3 variants and the GCTA-GREML method [33]. The genetic relationship matrix was used to derive genetic PCs using the GCTA software (version 1.94.1) [34].
Evaluation of PRS model performance and established risk factors
We assessed the predictive performance of the endometrial cancer PRS model in the UK Biobank using logistic regression to calculate the Nagelkerke’s R^2^ and variance on the liability scale explained by PRS as described previously [35]. We converted the observed R^2^ to the liability scale assuming an endometrial cancer population prevalence of 3%. Participants were divided into percentiles based on their endometrial cancer PRS distribution, and their estimated odds ratios (ORs) for endometrial cancer risk were calculated using the middle two deciles (40%–60%) as the reference group. AUCs were reported for the endometrial cancer PRS and the following: (1) each of the seven established endometrial cancer risk factors; (2) the epidemiologic model comprising all seven established endometrial cancer risk factors; and (3) an integrated model combining the epidemiological model with the endometrial cancer PRS. All analyses were adjusted for age at initial visit and the top 10 genetic PCs. We used stratified bootstrap with 2000 replicates to compute their corresponding 95% confidence intervals (CIs). Sample size adequacy was assessed using the events-per-variable (EPV) approach [36]. With 982 endometrial cancer cases and 8 predictors in the integrated model (7 epidemiological variables plus PRS), our study achieved an EPV of 122.8, exceeding the minimum threshold of 10 events per variable recommended by Peduzzi et al. [36]. Number needed to screen (NNS) was calculated as the reciprocal of absolute risk for each PRS percentile to provide clinically interpretable risk stratification metrics [37]. We calculated the net reclassification index (NRI) to quantify the improvement in the reclassification of endometrial cancer cases and non-cases by the integrated model as compared to the traditional epidemiological model (i.e. without PRS). NRI was interpreted as the net proportion of participants with improved risk classification, following established guidance for clinical prediction model evaluation [38].
BMI and PRS associations with endometrial cancer
UK Biobank incident endometrial cancer cases and non-cases were categorized by BMI (BMI < 25 kg/m^2^, 25 kg/m^2^ ≤ BMI < 30 kg/m^2^, and BMI ≥ 30 kg/m^2^) or endometrial cancer PRS tertiles from non-cases. We evaluated associations of BMI and endometrial cancer PRS with endometrial cancer status using Cox proportional hazard models. We tested the proportional hazards assumption for covariates included in a model fit by testing for independence between the scaled Schoenfeld residuals and time. P-values for trend were estimated using endometrial cancer PRS and BMI as continuous variables. The multivariable models were adjusted with each other for BMI groups and endometrial cancer PRS tertiles and additionally accounted for age at menarche, number of live births (as a categorical factor), ever taken oral contraceptive pill, PGS for age at menopause, PGS for SHBG levels, PGS for testosterone levels, age at initial assessment, and the top 10 genetic PCs. We additionally performed a sensitivity analysis where multivariable models were adjusted with each other for BMI and endometrial cancer PRS as continuous variables. Follow-up time was calculated from the baseline date (date when attended assessment center during their initial visit) to the date of endometrial cancer diagnosis or death (whichever occurred first). Cumulative incidence rates of endometrial cancer during the follow up period were generated for the different BMI and endometrial cancer PRS groups. We tested the interactions between BMI and endometrial cancer PRS by adding an interaction term in the multivariable model.
We further explored the joint associations of BMI groups and PRS tertiles with endometrial cancer using the Cox proportional model accounting for all covariates mentioned above. A nine-group comprehensive variable was thus formed, and the first PRS tertile and a normal weight (BMI < 25 kg/m^2^) group was considered as the reference group. We used a Z-test comparison to assess for differences in PRS effect between groups, as per Reay et al. [39].
All analyses were performed using R software version 4.2.2 (https://www.R-project.org/). Distribution of baseline characteristics was assessed by descriptive statistics and compared between case and non-case groups using χ^2^ tests for categorical variables and t-tests for continuous variables. Cox proportional hazards ratio models were conducted using the survival (version 3.5–5) and cumulative incidence rate plots were generated using the survminer (version 0.4.9) R packages. NRI calculations were performed using the PredictABEL R package (version 1.2–4) [40]. AUC were assessed using the pROC (version 1.18.0) R package [41]. Statistical tests were two-sided and statistical significance was evaluated at P < 0.002 after Bonferroni correction for 25 post hoc comparisons. The correction encompasses the primary endometrial cancer PRS association, PRS decile contrasts, model AUC comparison, BMI- and PRS-stratified risk analyses, and joint BMI × PRS cell contrasts. P-values for baseline characteristic were considered descriptive. Data analysis was conducted from February 22 to Sept 21, 2023.
Results
Baseline characteristics of study population
This analysis comprised 134,304 unrelated female participants of European genetic ancestry, including 982 cases of endometrial cancer and 133,322 cancer-free cohort participants (Fig. 1A). Consistent with established knowledge, endometrial cancer cases, in comparison to non-cases, exhibited older age at initial assessment, earlier age at menarche, later age at menopause, lower SHBG levels, higher testosterone levels, and a lower frequency of oral contraceptive pill use (Table 1; Additional File 1: Table S1). While the mean number of live births did not significantly differ between cases and controls, a higher proportion of cases were nulliparous compared to non-cases (22.6% vs. 18.9%). As anticipated, we observed a higher prevalence of obesity (BMI ≥ 30 kg/m^2^) among participants with endometrial cancer compared to non-cases (P < 2.2 × 10^−16^; Table 1). Although several baseline variables reached statistical significance, many of these reflect small absolute differences that are not clinically meaningful and are expected given the large cohort size. While differences such as BMI and oral contraceptive pill use reflect large, clinically relevant associations, variables with smaller differences were nevertheless included as covariates to ensure proper control for confounding in subsequent multivariable analyses. Table 1. Baseline characteristics of participants in the UK BiobankBaseline characteristicsOverallCasesControlsP-valueNumber of female participants13,4304982 (0.7%)133,322 (99.3%)Age at initial assessment, years (SD)55.7 (8.0)59.5 (6.6)55.7 (8.0) < 2.2 × 10^−16^BMI (SD), continuous26.9 (5.1)30.5 (6.9)26.9 (5.1) < 2.2 × 10^−16^BMI categories^a^ BMI < 25 kg/m^2^55,090 (41.0%)210 (21.4%)54,880 (41.2%) < 2.2 × 10^−16^ 25 kg/m^2^ < = BMI < 30 kg/m^2^48,703 (36.3%)340 (34.6%)48,363 (36.3%) BMI > = 30 kg/m^2^30,128 (22.4%)427 (43.5%)29,701 (22.3%) Missing (%)383 (0.3%)5 (0.5%)378 (0.3%) Age at menarche, years (SD)13.0 (1.6)12.7 (1.6)13.0 (1.6) < 9.5 × 10^–10^ Number of females who reported their age at menarche (%)130,449 (97.1%)961 (97.9%)129,488 (97.1%) Missing (%)3855 (2.9%)21 (2.1%)3834 (2.9%) Age at menopause, years (SD)50.4 (4.4)51.7 (4.3)50.4 (4.4) < 2.2 × 10^−16^ Number of post-menopausal females (%)72,459 (54.0%)754 (76.8%)71,705 (53.8%) Number of pre-menopausal females (%)57,218 (42.6%)184 (18.7%)57,034 (42.8%) Missing (%)4627 (3.4%)44 (4.5%)4583 (3.4%) Age at menopause PGS^b^ (SD)0 (1.43)0.17 (1.44)0 (1.43)2.1 × 10^−4^ SHBG levels (nmol/L; SD)62.5 (30.7)50.6 (26.0)62.6 (30.7) < 2.2 × 10^−16^ Number of females that had detectable SHBG levels115,035 (85.7%)854 (87.0%)114,181 (85.6%) Missing (%)19,269 (14.3%)128 (13.0%)19,141 (14.4%) SHBG PGS^b^ (SD)0 (0.22)−0.05 (0.23)0 (0.22)8.2 × 10^−11^ Testosterone levels (nmol/L; SD)1.1 (0.6)1.2 (0.6)1.1 (0.6)5.4 × 10^−7^ Number of females that had detectable testosterone levels108,095 (80.5%)820 (83.5%)107,275 (80.5%) Missing (%)26,209 (19.5%)162 (16.5%)26,047 (19.5%) Testosterone PGS^b^ (SD)0 (0.34)0.05 (0.36)0 (0.34)2.8 × 10^−6^ Number of live births (SD), count1.8 (1.2)1.7 (1.2)1.8 (1.2)0.2Number of live births categories^a^ 0 (%)25,428 (18.9%)222 (22.6%)25,206 (18.9%)1.1 × 10^−2^ 1 (%)17,798 (13.3%)118 (12.0%)17,680 (13.3%) > 1 (%)90,999 (67.8%)642 (65.4%)90,357 (67.8%) Missing (%)79 (0.1%)079 (0.1%)Ever taken oral contraceptive pill^a^ Yes (%)112,428 (83.7%)712 (72.5%)111,716 (83.8%) < 2.2 × 10^−1^ No (%)21,645 (16.1%)270 (27.5%)21,375 (16.0%) Missing (%)231 (0.2%)0231 (0.2%)Abbreviations: SD Standard deviation, BMI Body mass index, PGS Polygenic score, SHBG Sex hormone binding globulin^a^Percentages may not add up to 100% due to rounding^b^PGS (polygenic scores) of age at menopause, SHBG, and testosterone were shifted to mean zero
Integration of PRS and established risk factors for the prediction of endometrial cancer
The endometrial cancer PRS was strongly associated with endometrial cancer risk amongst unrelated participants with European genetic ancestry (P-value < 2.2 × 10^−16^) (Additional File 2: Figure S1). The Nagelkerke’s R^2^, a pseudo-R^2^ statistic measuring proportion of variance explained by endometrial cancer PRS, was 0.9% and variance on the liability-scale explained by endometrial cancer PRS was 1.9%. Risk stratification analysis revealed meaningful differences across PRS percentiles. Participants in the top 1% of the distribution had an absolute endometrial cancer risk of 1.71%, corresponding to an NNS of 58. The top 10% showed 1.05% absolute risk with NNS of 95. Compared to the middle quintile of participants (40–60%), participants in the top 10% and top 1% of the PRS distribution had a 1.98-fold (95% CI 1.58–2.49; P = 3.03 × 10^−9^) and 3.06-fold (95% CI 1.97–4.76; P = 7.10 × 10^−7^) increased risk of developing endometrial cancer respectively (Table 2). The PRS model predicted endometrial cancer status at an AUC of 0.671 (95% CI 0.654–0.687). While this was slightly higher than for six of the established endometrial cancer risk factors (AUC range 0.648–0.659), it was lower than the prediction accuracy for BMI (AUC = 0.713; 95% CI 0.696–0.729) (Table 3). The integrated model achieved superior discrimination compared to the epidemiological model alone (AUC = 0.739 vs 0.728; difference = 0.011; P = 3.98 × 10^−5^), representing a 1.5% relative improvement. The continuous net reclassification improvement for endometrial cancer prediction was 0.25 (95% CI 0.19–0.31; P < 1 × 10^−4^), indicating that for every 1000 participants screened, the integrated model would provide net improved risk classification for 250 participants compared with the epidemiological model alone. Table 2. Risk of endometrial cancer across polygenic risk score (PRS) percentiles, including absolute risk and number needed to screen (NNS)PercentileOR95% CIP-valueCases (n)****Non-cases (n)****Absolute risk (%)****NNS0–10%0.640.47 - 0.890.0074913,3820.3727410–20%1.040.79 - 1.370.787913,3510.5917020–30%1.080.82 - 1.410.588213,3480.6116430–40%1.160.89 - 1.510.278813,3430.6615340–60% (Reference)1NANA15226,7080.5717760–70%1.601.26 - 2.031.3 × 10^-^^4^12113,3100.9011170–80%1.651.30 - 2.093.5 × 10^−5^12513,3050.9310780–90%1.801.42 - 2.277.5 × 10^−7^13613,2941.019990–100%1.981.58 - 2.493.0 × 10^−9^12711,9601.059599–100%3.061.97 - 4.767.1 × 10^−7^231,3211.7158Endometrial cancer risk by decile of polygenic risk score (PRS) among 129,829 unrelated UK Biobank women of European genetic ancestry, with the 40–60th percentile used as the reference category. The number needed to screen (NNS) was calculated as the reciprocal of absolute risk, estimating how many women would need to be screened to detect one case of endometrial cancerAbbreviations: OR Odds ratio, CI Confidence interval, NNS Number needed to screenTable 3Comparison of endometrial cancer prediction model performance using area under the curve (AUC), sensitivity, and specificity**Prediction ModelAUC95% CI****Sensitivity (%)****Specificity (%)****PPV (%)**NPV (%)**Endometrial cancer PRS0.6710.654–0.68766.058.81.299.6BMI (continuous)0.7130.696–0.72963.967.61.499.6Number of live births0.6480.632–0.66460.560.11.199.5Age of menarche0.6550.639–0.67061.860.71.299.5Age of menopause PGS0.6480.632–0.66462.958.91.199.5OCP use (never/ever)0.6520.637–0.66862.858.91.199.5SHBG level PGS0.6590.644–0.67565.058.21.199.6Testosterone level PGS0.6520.636–0.66861.860.11.199.5Epidemiological model0.7280.712–0.74468.364.71.499.6Integrative model0.7390.723–0.75463.371.61.699.6Discriminative performance of endometrial cancer risk factors and prediction models in UK Biobank. Epidemiological model incorporated established risk factors (BMI, age at menarche, parity, oral contraceptive use, menopause age, SHBG, and testosterone); integrative model combining PRS with the epidemiological model. All models were adjusted for age at initial visit, assessment centre and the first ten principal componentsAbbreviations: AUC Area under the receiver operator curve, CI Confidence interval, PPV Positive predictive value, NPV Negative predictive value, PRS Polygenic risk score, BMI Body mass index, PGS Polygenic score, OCP Oral contraceptive pill, SHBG Sex hormone binding globulin
Associations of BMI and PRS with endometrial cancer risk
The proportional hazards regression model assumes that the ratio of hazards between two groups is constant over time. There was no evidence to support violation of the proportional hazard assumption in the current analysis (global test P = 0.08). Cumulative incidence curves indicated that only participants in the top tertile of the PRS distribution displayed increased risk compared to the bottom and middle tertiles; whereas higher cumulative incidence rate was associated with both the overweight (25 kg/m^2^ ≤ BMI < 30 kg/m^2^) and obese (BMI ≥ 30 kg/m^2^) groups (Fig. 2A). BMI and PRS were independently associated with endometrial cancer risk (Fig. 2B). In the model mutually adjusted for BMI groups and PRS tertiles, compared with the bottom PRS tertile, the top PRS tertile displayed an increased risk (1.71-fold) for endometrial cancer (95% CI, 1.45–2.00; P = 3.7 × 10^−11^). For BMI, overweight and obese groups presented a 1.56-fold (95% CI, 1.31–1.86; P = 7.2 × 10^−7^) and a 3.03-fold (95% CI, 2.55–3.59; P = 4.7 × 10^−37^), increased risk of endometrial cancer, respectively, compared with normal BMI group (Fig. 2B).Fig. 2. Cumulative incidence curves and multivariable-adjusted effects of endometrial cancer polygenic risk score (PRS) and BMI on endometrial cancer risk. Cumulative incidence curves were drawn for PRS tertiles (A) and BMI groups (B) in the UK Biobank. The unit of follow-up time was months, and the start point was defined as the 12th month (1 year after recruitment). C Multivariable models were adjusted for either PRS or BMI group and additionally adjusted for age at menarche, number of live births, ever taken oral contraceptive pill, and PGS values of age at menopause and levels of SHBG and testosterone, as well as age at initial assessment and the top 10 genetic principal components
When endometrial cancer PRS and BMI were treated as continuous variables, an increase of one standard deviation of endometrial cancer PRS was associated with 2.13-fold risk of endometrial cancer (95% CI, 1.79–2.54; P = 2.52 × 10^−17^), while a one standard deviation increase of BMI (5.13 kg/m^2^) was associated with a 1.59-fold risk (95% CI, 1.52–1.67; P = 2.52 × 10^−80^). There was no evidence of interaction between BMI and endometrial cancer PRS (P for interaction = 0.39). Upon stratification by BMI group, we observed that only those with the greatest polygenic load (i.e. the top PRS tertile) in each group had an increased risk of endometrial cancer risk. Participants in the top PRS tertile and the highest BMI group exhibited the greatest risk (HR = 4.94; 95% CI, 3.65–6.68; P = 7.67 × 10^−25^; Fig. 3); the relatively wide confidence intervals likely reflecting fewer incident cases in this subgroup and additional variability from stratifying on two correlated risk factors. Even for participants with a normal BMI, those in the top PRS tertile had a 2.01-fold increased risk (95% CI, 1.45–2.78; P = 2.50 × 10^−5^), compared with the bottom PRS tertile. Supporting the independent effects of BMI and PRS on endometrial cancer risk, the effect of the PRS in the highest BMI group was not significantly different to the PRS effect in the lowest BMI group (Z-score for difference in effect sizes = 1.59; P-value = 0.11). Increased risks for the top PRS tertile persisted after additionally adjusting for continuous BMI (Additional File 2: Figure S2).Fig. 3. The joint association of genetic risk and BMI with endometrial cancer. Multivariable models were adjusted for age at initial assessment, age at menarche, number of live births, ever taken oral contraceptive pill, PRS values of age at menopause and SHBG and testosterone and the top 10 genetic principal components. The dashed vertical line indicates a Hazard ratio of 1
In sensitivity analyses, replacing PGS for SHBG, testosterone, and age at menopause with their directly measured phenotypes, the results were highly consistent with our main findings. The integrated model combining the endometrial cancer PRS with measured epidemiological factors achieved an AUC of 0.753 (95% CI 0.733–0.773) compared with an AUC of 0.736 (CI 95% 0.715–0.757) for the epidemiological model alone (P = 7.9 × 10^−5^), mirroring the results using PGS (AUC = 0.739 vs 0.728; Table 3). Joint BMI × PRS assessment was also unchanged. The full results are provided in Additional File 3: Supplementary Note.
Discussion
In this cohort study of the UK Biobank, our initial focus was on assessing the predictive performance of endometrial cancer PRS and established risk factors. Firstly, we generated a risk distribution for PRS, revealing that individuals in the top 1–10% of the PRS had a risk comparable to that associated with a first-degree family history of endometrial cancer [42, 43]. The incorporation of PRS with the epidemiological risk factors in a risk prediction model led to a modest improvement in performance compared to the model that included the non-genetic risk factors alone. Subsequently, we evaluated associations of BMI and PRS with endometrial cancer, finding a joint association with endometrial cancer risk, with participants with high BMI who also had the highest PRS exhibiting the greatest risk. Participants in the top PRS tertile experienced a lower risk if they were not obese. Conversely, individuals in the top PRS tertile had increased endometrial cancer risk compared to other tertiles, even when they had a normal weight. These findings highlight the potential use of endometrial cancer PRS to identify high-risk individuals in the UK Biobank, which is particularly significant given the lack of clinical guidelines for endometrial cancer screening in the general population [44]. Indeed, current screening recommendations are primarily targeted at women with or at risk of Lynch syndrome who account for only ~ 3% of endometrial cancer cases [45, 46].
Previous studies have shown that PRS alone provides limited benefit in population screening, individual risk prediction, and population risk stratification [47, 48]. However, disease risk stratification can be improved through the integration of PRS and other risk factors [49], as we have also demonstrated. One more appropriate application of PRS is to facilitate personalized cancer screening and management practices. For instance, inclusion of PRS into breast cancer risk estimation may reduce screening and enable more personalized risk management strategies for CHEK2 and ATM pathogenic variant carriers [50]. PRS has also been shown to add value in distinguishing type 1 from type 2 and monogenic diabetes in adults with a pre-existing diagnosis of diabetes and in selecting the best treatment for different types of diabetes [51]. As we observed marginal improvement by incorporating endometrial cancer PRS with established risk factors, endometrial cancer PRS may aid in stratifying individuals with Lynch syndrome pathogenic variants.
To the best of our knowledge, this is the first study to quantify the association of BMI with endometrial cancer risk across different genetic risk levels defined by PRS. Our results suggest that, irrespective of PRS, endometrial cancer risk is positively associated with BMI and can be substantially mitigated by weight reduction. An analysis of the Women’s Health Initiative observational study found that women experiencing weight loss of ≥ 5% was linked to a 29% decrease in endometrial cancer risk, and intentional weight loss of ≥ 5% in females with obesity was associated with 56% lower endometrial cancer risk [52]. A separate meta-analysis of 13 published studies confirmed a similar association between intentional weight loss and reduced endometrial cancer risk; additionally, the analysis revealed that bariatric surgery was associated with a remarkable 59% reduction in endometrial cancer risk [53].
The findings that endometrial cancer risk associates with a higher PRS after accounting for established risk factors suggest that endometrial cancer can progress via other mechanisms. Furthermore, although obesity is often defined by BMI, it does not fully represent the distribution of body fat within the human body. In a recent study, it was reported that BMI formed a distinct cluster of fat distribution traits, such as visceral-to-subcutaneous adipose tissue ratio, waist-to-hip ratio without adjustment for BMI, and waist adjusted for BMI. The study further revealed that one unit increase in waist-to-hip ratio or visceral-to-subcutaneous adipose tissue ratio was associated with elevated hazard ratio for endometrial cancer [54]. Therefore, including other aspects of obesity such as fat distribution and other risk factors not assessed in this study such as duration of hormonal therapy usage, LDL-cholesterol levels, circulating estrogen levels, sedentary behaviors, and smoking (summarized in [10, 11]) may further improve our capability to predict endometrial cancer in the general population.
The observed risk stratification has practical implications for endometrial cancer prevention. The 3.06-fold increase in the top 1% PRS represents a substantial genetic risk factor affecting 1% of the population, compared to mismatch repair pathogenic variants associated with Lynch Syndrome, which affects approximately 0.3% of the general population [55] yet account for only ~ 3% of endometrial cancer [42, 43]. This genetic risk stratification could identify high-risk individuals beyond the current focus on Lynch Syndrome screening. The NNS of 58 for the highest-risk PRS group compares favorably to established fecal occult blood test screening programs for colorectal cancer, where Australian data suggest an NNS of approximately 33 to detect a colorectal cancer case via colonoscopy among participants testing positive for microscopic blood in the stool [56].
We acknowledge that using PGS derived from partially overlapping datasets may introduce bias through circularity or collider mechanisms. However, our sensitivity analyses using directly measured hormone levels and reported age at menopause produced results consistent with those obtained using PGS, supporting that any potential overlap did not materially influence the associations observed.
Our study has several limitations. First, our findings lack external validation in an independent cohort, limiting the assessment of model generalizability outside the UK Biobank population. While the large sample size and robust EPV of 122.8 provides confidence in model stability, future work should validate both the PRS and the integrated model in external cohorts. Second, while the statistically significant improvement in the integrated model AUC is modest, the clinical impact of this enhancement is supported by a continuous NRI of 0.25, equivalent to ~ 250 participants correctly reclassified per 1000 screened, underscoring the value of adding genetic risk information to traditional epidemiological models. Future studies, such as decision curve and cost-effectiveness analyses, could further evaluate the clinical utility of this integrated model. Third, our analysis was restricted to participants of European genetic ancestry, defined using UK Biobank’s combined self-reported and genetic criteria, to minimize confounding from population structure. However, as emphasized in the NASEM report [22], these population descriptors do not reflect discrete or genetically homogenous groups, but are shaped by sampling frameworks and sociocultural context. This approach enhances internal validity but restricts generalizability; predictive performance may attenuate, and risk associations may differ in non-European or admixed populations due to underlying differences in allele frequencies, linkage disequilibrium, and genetic architecture. We fully acknowledge that principal component analysis and self-reported ancestry are imperfect proxies for human genetic diversity and may mask important variation, as outlined in current guidelines. Validation and recalibration of polygenic risk scores in ancestrally and culturally diverse cohorts are essential to ensure equitable and clinically useful risk prediction. Fourth, BMI was assessed at a single timepoint, which may misclassify exposure and attenuate associations if weight changed during follow-up; future analyses using repeated BMI assessments in UK Biobank with time-updated Cox models, longitudinal weight-change trajectories, or joint models of longitudinal BMI and incident endometrial cancer could better capture dynamic adiposity and refine risk estimates.
Conclusions
The findings of this study demonstrate that an endometrial cancer prediction model incorporating both epidemiological risk factors and PRS has the best performance. The improvement of prediction of endometrial cancer by PRS is anticipated to increase as additional genetic risk variants are identified by larger GWAS. Importantly, BMI and endometrial cancer PRS exert independent effects on the risk of developing endometrial cancer, revealing a substantial increase in risk for individuals with highest BMI and high PRS. While underscoring the potential protective impact of weight loss, our findings also indicate that elevated PRS poses an increased risk, even in individuals with a normal weight. These insights emphasize the complex interplay between genetic susceptibility, lifestyle factors, and obesity in endometrial cancer risk, reinforcing the need for personalized and nuanced approaches to screening and preventive interventions.
Supplementary Information
Additional file 1: Table S1. Baseline characteristics of the 129,829 UK Biobank female participants included in the study, stratified by endometrial cancer case status. Characteristics include demographic data, anthropometric measurements, reproductive factors, and hormone levelsand their corresponding polygenic scores.Additional file 2: Supplementary Figures. Fig S1–Distribution of the endometrial cancer polygenic risk scoreby unrelated female cases and controls of European genetic ancestry in UK Biobank. Fig S2–Joint association of genetic risk and BMI with endometrial cancer risk with additional adjustment for continuous BMI.Additional file 3: Supplementary Note. Sensitivity analyses using reported age of menopause and measured SHBG and testosterone levels instead of polygenic scores for these variables. Includes Table 1 comparing discriminative performance of prediction models and Figs. 1–2 demonstrating that BMI and PRS independently associate with endometrial cancer risk without evidence of interaction.Additional file 4: Supplementary Note. Strengthening the Reporting of Observational Studies in Epidemiology checklist.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1UK Biobank data showcase. https://biobank.ctsu.ox.ac.uk/crystal/browse.cgi.
- 2GWAS Catalog. Trait: Age at natural menopause. https://www.ebi.ac.uk/gwas/studies/GCST 90320256.
- 3GWAS Catalog. Trait: Sex hormone-binding globulin levels. https://www.ebi.ac.uk/gwas/studies/GCST 90012107.
- 4GWAS Catalog. Trait: Total testosterone levels. https://www.ebi.ac.uk/gwas/studies/GCST 90012112.
- 5GWAS Catalog. Trait: Endometrial Cancer. https://www.ebi.ac.uk/gwas/studies/GCST 006464.
