Multi-trait Analysis of GWAS Expands Eosinophilic Esophagitis Genetic Susceptibility and Polygenic Risk Scores
Michael P. Trimarchi, Bahram Namjou-Khales, Netali Ben-Baruch Morgenstern, Mark Rochman, Xiaoting Chen, Garrett Osswald, John Besse, Molly Shook, Julie Caldwell, Michael Lape, Tetsuo Shota, Matthew T. Weirauch, Melanie Ruffner, Gregory Constantine, Lisa J. Martin

TL;DR
This study identifies new genetic risk factors for eosinophilic esophagitis and improves risk prediction by combining data from multiple atopic diseases.
Contribution
The study introduces a multi-trait GWAS analysis and a novel polygenic risk score for EoE, revealing new loci and biological pathways.
Findings
A GWAS identified 11 independent genetic risk variants for EoE, including 3 new loci.
A multi-trait analysis identified 33 independent EoE risk variants, including 14 novel loci.
A polygenic risk score from MTAG showed a significantly higher risk of EoE compared to GWAS-only scores.
Abstract
Atopic diseases, including eosinophilic esophagitis (EoE), are driven in part by genetic susceptibility. We performed a genome-wide association study (GWAS) of 1,757 EoE and 14,467 population controls, identifying 11 independent genetic risk variants spanning 8 EoE risk loci (p < 5×10−8), including 3 new loci. A multi-trait analysis of GWAS (MTAG) of EoE and other atopic diseases including over 450,000 subjects from the UK Biobank study identified 33 independent EoE genetic risk variants spanning 24 loci, including 14 novel loci. Functional studies nominated 90 EoE candidate genes, some involved in unexpected pathoetiology beyond type 2 immunity. A polygenic risk score derived from the MTAG replicated high risk of EoE compared with PRS derived from GWAS alone (OR 11.57 [6.90–19.40] in the top vs. bottom decile). An interactive tool (EGIDExpress) was developed to enable dataset queries…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEosinophilic Esophagitis · IL-33, ST2, and ILC Pathways · Eosinophilic Disorders and Syndromes
Introduction
Eosinophilic esophagitis (EoE) is a chronic, antigen-induced allergic inflammatory disorder affecting approximately 1 in 700 individuals and characterized by the infiltration of eosinophils in the esophagus^1–6^. EoE is strongly linked to allergic inflammation, as evidenced by the success of elimination diets that remove food allergens, the frequent co-occurrence of EoE with other allergic conditions like asthma and atopic dermatitis, and animal models in which allergen exposure induces EoE^5,7,8^. Furthermore, studies have shown the crucial role of type 2 immune mediators, such as interleukin 4 (IL-4) and interleukin 13 (IL-13), in both experimental and clinical settings^9–12^. The inflammatory cascade associated with EoE leads to impairment of the esophageal epithelial barrier, characterized by dilated intercellular spaces, dysregulated gene expression, basal cell hyperplasia, local tissue eosinophilia, disrupted epithelial differentiation, and fibrostenosis, if left untreated^13,14^.
The pathogenesis of EoE is primarily driven by food hypersensitivity and allergic responses and influenced by genetic and environmental factors^15–17^. Genetic association studies have nominated a limited number of risk loci at genome-wide significance^18–24^. Some risk loci are shared with other allergic diseases. For instance, variants at the 5q22 (TSLP/WDR36), 11q13 (LRRC32/C11orf30), and 12q13 (STAT6) loci are associated with asthma and atopic dermatitis and have been consistently associated with EoE risk^19,24^. To date, the robustly associated 2p23 (CAPN14) EoE risk locus is unique to EoE and not implicated in other atopic diseases^19,20,24,25^. Other EoE loci, such as 19q13 (ANKRD27) and 16p13 (CLEC16A), have been replicated within individual studies, though more research is needed to fully understand their contribution^19,21^. Gene-gene interactions, particularly between IL4 and TSLP, and gene-environment interactions have also been shown to have significant roles in EoE development^26,27^.
To further investigate the genetic underpinnings of EoE, we conducted a genome-wide association study (GWAS) aimed at replicating and refining known risk loci and also identifying new ones. This study included between six hundred thousand and two million genotyped genetic variants in independent sets of cases and controls of European ancestry with a genome-wide coverage of five to eight million genetic variants after imputation across two cohorts including 1,757 individuals with EoE who were assessed in the context of population, atopic, and non-atopic controls. Furthermore, because EoE is genetically correlated with other allergic diseases^22^, we performed a multi-trait analysis, examining loci shared between EoE and other atopic diseases in a larger cohort of more than 93,000 individuals with and 362,000 without history of atopic disease using the United Kingdom Biobank (UKB) dataset. Each analysis was bolstered by functional studies that integrated expression quantitative trait loci (eQTL) data and chromatin interaction models to predict causal genes linked to EoE risk loci. Accordingly, we identified 90 genes associated with EoE risk, including 47 not previously implicated in the disease. Furthermore, the multi-trait analysis identified a substantial polygenic risk for EoE; for example, those in the fifth decile had a 5-fold increased risk and those in the top decile had greater than an 11-fold increased risk of EoE compared to those in the bottom decile. These findings expand our understanding of the genetic architecture of EoE and highlight the utility of integrating genetic data across related atopic conditions. Additionally, the results of this study are available through an online interactive tool EGIDExpress. Taken together, our findings have uncovered a large set of genetic risk factors that contribute to the pathoetiology of EoE and provide a public resource for the field.
Results
EoE GWAS study
To identify new EoE genetic risk loci and replicate previously described loci, we assembled two independent EoE case-control cohorts (herein referred to as EoE_1_ and EoE_2_, as well as the merged cohort EoE_T_) encompassing a total of 1,757 participants with EoE and 14,467 population controls without EoE and performed a GWAS (Fig. 1, Figure S1, Figure S2, Table S1). Eleven lead variants across 8 EoE risk loci met genome-wide significance (p < 5×10^−8^): 2p23.1, 5q22.1, 5q23.1, 8q21.13, 10p14, 11q13.5, 16p13.13, and 16p12.1 (Fig. 2, Table 1A). An additional 14 lead variants across 17 EoE risk loci showed putative association (10^−6^ > p > 5×10^−8^, Table 1B).
Two risk loci displayed multiple, independent associations (3 at 2p23.1, 2 at 11q13.5). At 2p23, conditional logistic regression analysis revealed multiple variants independently associated with EoE. Two variants that were independently associated with EoE (rs77569859, rs11124250) at 2p23 within the gene body of CAPNT4 were near or overlapping previously reported variants^24^. A third association at rs77746069 between exon 1 and 2 of the downstream gene GALNT14 demonstrated residual attenuated association after conditioning on the other two variants (Figure S3). At 11q13.5, we replicated two independent associations (rs61894547, rs7936323, linkage disequilibrium [LD] r^2^ < 0.1) that had been previously reported^24^.
Three novel, genome-wide significant EoE risk loci were mapped to the following chromosomal bands: 8q21.13, 10p14, and 16p12.1. The 8q21.13 locus has previously been associated with atopic disease and eosinophil counts^28,29^; chromatin interaction data suggest a long-range interaction with the gene ZBTB10, a transcription factor^30^ (Table S2). Variants at 10p14 have been previously associated with EoE in the vicinity of ITIH5 and GATA3^24^; however, this study identified an independent association (LD r^2^ < 0.01) over 1 MB away in an intergenic region, a locus that has been associated with eosinophil counts and atopic disease^28,29^. The 16p12.1 risk locus encompasses a 120-kb region spanning KDM8, NSMCE1, and IL4R that has been associated with asthma^31,32^; the region contains eQTLs for both NSMCE1 (expressed in esophageal mucosa, monocytes, CD4 + T cells) and IL4R (expressed in esophageal mucosa, monocytes), and associated variants also interact with IL4R via chromatin looping. Independent significant variants at this locus were independent of the Q576R amino acid substitution (rs1801275) associated with atopic dermatitis and asthma (LD r^2^ < 0.1)^33,34^.
Five of the EoE risk loci identified have been previously reported: 2p23.1, 5q22.1, 5q23.1, 11q13.5, and 16p13.13. 2p23.1 includes CAPN14, which encodes for a protease that is uniquely expressed in the esophageal epithelia as a function of IL-13–driven inflammation, EoE disease activity, and EoE risk genotype^19,20,35–37^. The 5q22.1 risk locus encompasses a 200-kb region of LD overlapping TSLP, WDR36, and CAMK4; this locus is also known to be associated with atopic disorders and peripheral blood eosinophil counts^38,39^. TSLP is a pro-atopy alarmin^40^ that has an important role in the development and activation of many immune cells and promotes Th2 cell development through IL-4 activation^41^ and proliferation of pathogenic effector memory Th2 cells^42^. The association at 5q23.1^24^ lies within a weakly characterized intergenic region. The risk locus at 11q13.5 is located between EMSY (also known as C11orf30) and LRRC32; functional studies suggest a role for both genes in EoE^24,43^, and the locus is also associated with eosinophil counts^28,44^, asthma^45^, atopic dermatitis^46^, allergic rhinitis^27^, serum IgE levels^47^, and Crohn disease^48^. The 16p13.13 risk locus^24^ overlaps CLEC16A and is associated with eosinophil counts and atopic disease; associated variants also include eQTLs with DEXl in CD4 + T cells and monocytes. Though CAPN14 likely contributes to the tissue specificity of EoE, the EoE risk genes TSLP, EMSY, LRRC32, and CLEC16A are hypothesized to have roles in numerous allergic diseases^49^.
Cases with EoE were also compared to controls with and without atopic disease (Table S5). Four additional risk loci were identified using non-atopic controls–3q28, 5q31.1, 8p23.1, 15q23)–of which 5q31.1 has been previously reported (Table S2)^22^. The 3q28, 5q31.1, and 8p23.1 risk loci were confirmed by a multi-trait analysis approach (see next section).
Multi-trait analysis of EoE and other atopic disease GWAS (EoE-atopy-MTAG)
EoE is a relatively rare atopic disease commonly associated with other atopic comorbidities^50,51^; 86% of the participants with EoE in this study reported at least one other atopic condition (Table S1). A recent EoE GWAS by Chang et al. showed a genetic correlation (r_g_) of over 50% with other atopic diseases, including allergic asthma, allergic rhinitis, and atopic dermatitis^22^. Consistent with previous findings, we found a strong genetic correlation in this study using an LD score regression approach (Table S3).
The multi-trait analysis of GWAS (MTAG) method was developed to augment the power of single-trait genetic studies by incorporating information from highly correlated traits (r_g_ > 0.5)^52^. Unlike traditional meta-analysis, which generates a single consensus effect estimate per variant across all studies and often reduces power when comparing different traits, MTAG offers separate effect estimates for each trait and is robust to sample overlaps.
The EoE and other atopic disease MTAG (EoE-atopy-MTAG) presented here tested the hypothesis for each variant that EoE and atopic disease share a common genetic etiology. EoE_1_ (n = 1,033 participants with EoE) was compared to atopic association studies derived from the UKB study (n = 93,386 participants with atopic disease) (Table S5). Leveraging whole-genome imputation, over 6 million variants were assessed. Fifty-six independent variants met genome-wide significance (p < 5×10^−8^), and 33 remained after removing 23 variants that did not meet a minimum threshold in the EoE_1_ cohort. These 33 lead variants spanned 24 genome-wide significant risk loci with robust evidence in EoE (p < 5×10^−8^, EoE_1_ p < 0.05) (Fig. 3). Of the 8 risk loci identified in the analysis of EoE_T_ (EoE n = 1,757), 6 were significant in the multi-trait analysis, highlighting the strong genetic correlation across allergic diseases. Variants at 2p23.1 and 5q23.1 identified in the case-control study were not significant in EoE-atopy-MTAG, supporting tissue specificity. Of the 24 risk loci identified by the MTAG analysis, 13 were novel in EoE (1q21.3, 1q24.2, 1q25.1, 3q28, 3q29, 4q27, 7p15.1, 8p23.1, 9p24.1, 11q23.3, 14q24.1, 16q12.1, 19q13.11) (Table 2A), 2 were replicated in EoE_T_ (8q21.13, 10p14), and 9 were previously reported^18–23^ (5q22.1, 5q31.1, 6q15, 11q13.5, 12q13.3, 15q22.2, 15q22.33, 16p13.13, 16p12.1) (Table 2B). Of the non-novel loci, 12q13.3 (STAT6; rs3024971) showed a new independent association (LD r^2^ < 0.1) 10 kb downstream of the variant reported by Sleiman et al.^25^.
To determine whether the EoE association seen in the EoE-atopy-MTAG was driven primarily by other atopic diseases, we compared the normalized effect size for the 33 EoE-atopy-MTAG lead variants across each of the constituent cohorts (Fig. 4A). Effect size was largest in EoE for 32 of 33 lead variants that met study thresholds for significance. Median effect size was substantially larger in EoE than other atopic diseases (EoE beta 0.24 vs. 0.09 for asthma, the next largest; p < 0.001 Repeated Measures ANOVA on Ranks and p < 0.05 Tukey test) (Figure S4). In summary, the EoE risk loci identified by the MTAG analysis were not driven primarily by other atopic diseases.
The genes encoded at the 13 new EoE genetic risk loci suggest biological mechanisms that lead to the development and pathogenesis of EoE. Notably, the 1q21.3 risk locus maps to an expansive 600-kb block within the epidermal differentiation complex^53^ including FLG and FLG2, which have known roles in esophageal barrier function in EoE^54,55^. EoE risk genes were expressed in both epithelial and immune cells, with many of the genes showing supporting evidence in mechanistic studies and/or differential expression in patients with EoE compared to controls (Table S6).
Interactive zoom plots are available on the LocusZoom platform for exploration of the association of individual variants in a genomic context for the EoE_T_ and EoE-atopy-MTAG analyses.
Sex-stratified association analysis of EoE risk loci
Due to the male predominance of EoE, we performed a sex-stratified analysis of EoE_T_ risk loci to test the hypothesis that some genetic risk would be amplified in males relative to females. A genome-wide, sex-stratified discovery analysis (Figure S5) identified one male-specific locus with genome-wide significance: 12q24.31 mapping to SCARB1 (p = 2.01×10^−10^; odds ratio [OR] = 1.44, 95% confidence interval [CI] 1.29–1.62, Figure S5A) with no association observed in females (p = 0.75, OR = 0.97, 95% CI 0.81–1.16). SCARB1 encodes for a scavenger receptor that binds cholesterol esters and has been implicated in inflammatory responses^56^. Because there were fewer females than males with EoE and a discovery analysis was likely underpowered to detect female-specific associations at genome-wide significance, a more limited sex-stratified analysis was performed for the genome-wide significant risk loci identified in EoE_T_, EoE-atopy-MTAG, and the SCARB1 risk locus (Table S4). No other sex-specific associations were observed with overlapping 95% CIs. Together, these results indicate male-specific association at 12q24.31 and no evidence of loci with female-specific association.
A validated polygenic risk score (PRS) for EoE
GWAS identifies population-level associations between specific genetic risk loci and disease, whereas polygenic risk score (PRS) can identify overall genetic risk burden for individuals. Two EoE PRS were developed using either case-control GWAS data from the EoE_1_ cohort (PRS-EoE_GWAS_, derived from case-control statistics in EoE_1_) or MTAG data from the EoE_1_ cohort (PRS-EoE_MTAG_, derived from the EoE-atopy-MTAG summary statistics in EoE_1_). Both the PRS-EoE_GWAS_ and PRS-EoE_MTAG_ were assessed in the independent EoE_2_ cohort. Globally, the EoE_2_ cohort had a significantly higher median standardized PRS-EoE_GWAS_ in cases than controls (68% vs. 45%, respectively, p < 0.0001) (Fig. 5A–B). A comparison of the two EoE PRS percentile distribution between cases and controls and the risk prediction per each decile in the EoE_2_ cohort are shown in Fig. 5. The discrimination between the top 10th percentile and the bottom 10th percentile of the PRS-EoE_MTAG_ (OR = 11.57, 95% CI 6.90–19.40) was superior to that of the PRS-EoE_GWAS_ (OR = 5.07, 95% CI 3.41–7.54) (Fig. 5C). The increased odds risk per standard deviation of the PRS-EoE_MTAG_ and PRS-EoE_GWAS_ was 1.71 (95% CI 1.52–1.92) and 1.65 (95% CI 1.51–1.80), respectively. Notably, the PRS-EoE_MTAG_ outperformed the PRS-EoE_GWAS_ in predicting EoE risk (p < 0.0001), further supporting the value of the multi-trait approach.
Mapping EoE risk loci to candidate genes
Independent significant variants from the association studies presented above were mapped to genes using three methods: eQTL (esophageal mucosa and immune cells), chromatin interaction (esophagus and immune cells), and position (Table 3). A total of 90 candidate EoE risk genes were mapped, including 26 that were mapped from EoE_T_ (Tables S2 and S9). A presentation of each of the 90 candidate genes in the context of originating association study and mapping method is provided in Table S10 and Figure S6.
Expression and function of EoE candidate risk genes in disease
To understand how the EoE risk loci might influence EoE biology, we first examined RNA expression of the 90 mapped candidate genes in esophageal biopsies. Disease-dependent expression was observed for 22 of 59 candidates that met expression thresholds; 19 of the 22 genes were upregulated in the diseased state (Fig. 6). This reflected significant enrichment for differentially expressed genes among EoE risk candidates compared to genomic background (22/59 [37%] vs. 2,684/11,340 [24%], Fisher exact test p = 0.020), affirming that the mapping process produced disease-relevant genes.
Ingenuity Pathway Analysis of the 90 candidate risk genes revealed pathways enriched for inflammatory signaling pathways, including canonical Th1 and Th2 activation (CD247, IL13, IL18R1, IL2, IL4, IL4R, IL5, IRF1, STAT6, TNFSF4, TSLP) (Tables S6 and S11). Other pathways identified included IL-27 signaling, NOD1/2 signaling, and the regulators BCL6, TNFSF4, and EBI3 (IL-27 subunit beta). IL-27 has diverse roles in innate and adaptive immunity mediated in part through activation of JAK/STAT signaling in CD4 + T cells, whereby IL-27 can promote or suppress inflammation depending on context^57^. Though NOD1/2 signaling is typically associated with pathogen recognition and innate immune responses, genetic variants in NOD1 and NOD2 have been associated with atopic dermatitis^58–60^, suggesting a role in allergic inflammation. BCL6 is a transcriptional repressor that can compete for STAT DNA binding sites, suppressing STAT-dependent IL-4 responses, and is itself regulated by STATs^61^. TNFSF4 is a TNF family cytokine that promotes T cell recruitment, proliferation, and cytokine production, with genetic polymorphisms that have been linked to allergic rhinitis^62^. Further research is needed to determine whether these upstream regulators drive EoE pathology and whether they may be viable therapeutic targets.
EoE genetic risk loci highlight disease-specific transcriptional mechanisms and cell types
To identify common transcriptional mechanisms that might be controlling multiple risk loci, enrichment for transcription factor genomic binding events (i.e., chromatin immunoprecipitation sequencing [ChIP-seq] peaks) at EoE risk loci was assessed using the Regulatory Element Locus Intersection (RELI) method^63^. Transcription factors MYB (Th2 cells), STAT5A (CD4 + T cells), and STAT5B (CD8 + T cells) (Table S7) bound EoE genetic risk loci more than expected compared to a null model of accessible chromatin. C-MYB promotes the proliferation and differentiation of hematopoietic stem cells and may directly impact CD8 + T cell differentiation and survival^64^, whereas STAT5 has well-known roles in inflammatory signaling, especially TSLP-induced Th2 signaling in EoE, mast cell activation in atopic dermatitis, IL-5 signal transduction in eosinophils, and activation of innate lymphoid cells (ILC2)^42,65–67^. a previous study examining allelic transcriptional regulatory activity at variants across previously published EoE risk loci activity identified enrichment for GATA3^68^. When assessing all EoE risk loci identified in this study, we observed overlap with GATA3 ChIP-seq peaks at 16 risk variants spanning independent risk loci; however, enriched GATA3 ChIP-seq overlap was not robust to multiple testing correction (p-adjusted > 0.05).
To understand which cell types express the candidate risk genes in EoE, we interrogated a recently published single-cell RNA sequencing (scRNA-seq) dataset that identified 60 distinct cell states in the esophagus in the context of EoE^69^. Forty-five of 90 candidates were expressed in biopsies in at least one cell type (Fig. 7 and Figure S7). Candidate genes were expressed across a diverse range of cell types, with 29 of 45 genes coalescing into distinct clusters of cellular localization. Seven genes were predominantly localized to the epithelium: apical (CAPN14, CRNN, TPRG1, SMAD3) and basal (CLNS1A, TSLP, WDR36), consistent with these genes’ roles in barrier function, external sensing, and initiation of inflammation^70^. Six genes were predominantly expressed in fibroblasts (LPP, FAM114A1, P4HA2, PHLDB1, LRP1, LRRC32), suggesting a role for fibroblasts in sustaining inflammation and disrupting restoration of barrier integrity as seen in a model of atopic dermatitis^71^ and recent studies implicating fibroblasts in EoE pathogenesis^22,72^. Five genes predominantly localized to mast cells (JAZF1, IL18R1, ESYT1, NSMCE1, IL4R), consistent with increasing attention on mast cells as drivers of EoE pathology^73^. Three genes were largely localized to macrophages (TLR1, TLR6, CIITA), suggesting a role for inappropriate or excessive activation of antigen-presenting cells in EoE pathogenesis^74^. Numerous genes were broadly expressed in lymphocytes and group 2 ILCs (ILC2s), with some more specifically expressed in specific CD4 + T cell subsets. Three genes localized to ILC2s and Th2 cells (IL13, IL5, RAD50), consistent with the understanding of EoE as a Th2-driven atopic disease^75^. Four genes localized to Th17 cells (CBL, MFHAS1, RORA, CAMK4), though many of these genes were also expressed to a lesser extent in Th2 cells. In atopic dermatitis, IL-17–producing Th17 cells have been postulated to be bystanders responding to general barrier dysfunction^76^, though their association with EoE genetic susceptibility here suggests that they deserve further investigation.
Discussion
Herein, we have advanced the understanding of the genetic architecture and underlying pathophysiology of EoE by expanding GWAS data and conducting the first multi-trait analysis of GWAS incorporating other atopic conditions, including asthma, allergic rhinitis, and atopic dermatitis. Through the collective examination of 1,757 individuals with EoE and 93,338 individuals with atopic disease, GWAS identified 11 variants across 8 risk loci; 3 of these loci were novel and included regions near classical immune regulators, such as GATA3 and IL4R. The finding of a novel male-specific association at 12q24.31 (SCARB2) is consistent with the observed higher heritability of EoE in males that is established by familial EoE and twin studies^77,78^ and a recent independent GWAS that identified 5 distinct sex-specific risk loci^22^
The multi-trait analysis of EoE and atopy GWAS (EoE-atopy-MTAG) demonstrates how the strong genetic correlations between EoE and other atopic diseases can be leveraged to identify new EoE susceptibility loci. Importantly, 33 independent loci were identified through MTAG analysis, 13 of which were novel in EoE. Beyond genetic risk loci, the MTAG approach implicated genes potentially important in EoE, including those mapped to the epidermal differentiation complex at 1q21, which contains the greatest density of genes dysregulated in the EoE transcriptome, such as FLG, FLG2, and CRNN ^79^. Despite the statistical testing bias limitations of the MTAG approach^52^ and an imbalanced cohort size of EoE compared with the other atopic diseases, the MTAG approach employed in our study added valuable findings beyond the traditional GWAS approach. MTAG allowed us to leverage publicly available GWAS data (93,000 atopic individuals in the UKB) for the analysis of EoE susceptibility, illustrating MTAG’s utility for the study of less common diseases. MTAG-derived associations demonstrated high overlap with traditional EoE GWAS loci, verifying the approach. Moreover, the effect size (OR) for the MTAG-derived associations was substantially larger in EoE than other atopic diseases. The MTAG-derived PRS outperformed the conventional GWAS–derived PRS (EoE_1_ cohort), further validating the efficacy of the MTAG approach. Importantly, the MTAG approach did not replicate the 2p23 (CAPN14) EoE risk locus, consistent with the lack of association at this locus with other allergic diseases. MTAG complements but does not replace single-trait GWAS approaches as it will miss trait specific loci such as CAPN14. Collectively, our findings substantiate the hypothesis of a shared genetic basis for EoE and other atopic disease^52^.
This study leveraged multiple mapping approaches together with single-trait and multi-trait GWAS to comprehensively map a total of 90 candidate EoE risk genes, greatly expanding the catalog of genes that may contribute to EoE disease susceptibility. Twenty-two of these candidate genes showed disease-dependent differential expression, orthogonally emphasizing potential contributions to EoE pathogenesis. Pathway enrichment analysis reinforced the central role of type 2 immunity in EoE pathogenesis. Genes in the NOD1/2 signaling pathway (e.g., CYLD and NOD2) and IL-27 regulatory pathways suggest broader immune regulation beyond type 2 responses, which may interact with microbial and environmental triggers. The EoE risk genes were expressed in a diverse set of cells, mainly coalescing into 7 distinct cellular clusters, including epithelial cells (basal and apical populations), fibroblasts, mast cells, myeloid cells, and polarized T helper subsets, especially Th2 and Th17 cells, the latter cells being previously underappreciated in EoE.
We observed enrichment of transcription factor genomic binding events for STAT5A/B and MYB, known regulators of Th2 cell activation and survival. STAT5 signaling, particularly in response to TSLP, has a crucial role in promoting Th2-mediated inflammation and eosinophilia, whereas MYB promotes T cell differentiation and hematopoietic cell proliferation. These findings highlight shared transcriptional circuits between genetic risk loci, providing opportunities for targeted interventions. Additional studies are warranted to comprehensively assess allelic transcriptional regulation at the expanded list of EoE risk loci.
A key outcome of this study is the development and validation of a PRS for EoE. Our previous study measured the increased effect size–weighted genetic burden of EoE risk variants for cases with EoE relative to controls but did not validate a PRS in an independent cohort^24^. Herein, we validated the PRS findings identified in the EoE_1_ cohort with the EoE_2_ cohort, demonstrating robust performance: individuals in the top compared to bottom decile showed nearly 12-fold increased risk of developing EoE. For comparison, the top effect size EoE risk variant (rs77746069) showed only a 2-fold increased risk (Table 1A). Notably, the MTAG-derived PRS outperformed the EoE-specific GWAS-derived PRS model compared to PRS-EoE_GWAS_, substantiating the proposed value of MTAG for calculating^52^ polygenic risk. Notably, the magnitude of the PRS-EoE_MTAG_ is robust compared with that of PRS of other immune-mediated, polygenic, allergic diseases (e.g., asthma), which have only one third to one half of the PRS effect size of PRS-EoE_MTAG_^80–82^. The relatively large PRS-EoE_MTAG_ is consistent with the larger sibling risk ratio seen in EoE compared with other atopic diseases^77,78^. These findings facilitate future efforts to integrate PRS into risk stratification, early diagnosis, and precision medicine strategies for EoE.
In conclusion, this study expands the genetic landscape of EoE and highlights pathways central to its pathogenesis, including type 2 inflammation, epithelial barrier dysfunction, and innate immune activation. By integrating functional genomics and multi-trait analyses, we provide novel insights into the shared genetic architecture of EoE and other allergic diseases. The validated PRS underscores the potential for genetic risk assessment in clinical practice, while the identification of novel loci increases possibilities for future therapeutic targeting. Finally, an interactive tool (EGIDExpress) enables queries and visualize datasets, which provides a useful resource for the field of type 2 immunity. Together, these findings represent a significant step toward understanding, diagnosing, and treating EoE within the context of atopic disease.
Methods
GWAS participants
The study was approved by the Institutional Review Boards at Cincinnati Children’s Hospital Medical Center (CCHMC) and all participating sites that were part of the National Institutes of Health (NIH) Consortium of Food Allergy Research (CoFAR)^50^ and the Consortium of Eosinophilic Gastrointestinal Disease Researchers (CEGIR) EoE Cohorts^83,84^. Cases were confirmed by a physician as fulfilling the diagnostic criteria for EoE (peak eosinophil count ≥ 15 eosinophils / high-power field in esophageal biopsy sections)^85^. A participant was considered to have atopy if they had a clinical history of atopic dermatitis, asthma, allergic rhinitis, and/or food allergy. Self-reported history of allergic rhinitis and atopic dermatitis was sufficient for diagnosis, when available.
Participants with EoE included 1,204 newly genotyped subjects who were included in EoE_1_ or EoE_2_ on the basis of genotype platform; EoE_1_ included those cases genotyped on the Illumina OMNI2.5 or OMNI5, whereas EoE_2_ included those cases genotyped on the Infinium Global Screening Array (GSA). Data from 553 subjects with EoE whose data were deposited in the database of Genotypes and Phenotypes (dbGaP) under accession phs000494.v1.p1 from a prior study^19^ were included in EoE_1_ to further increase statistical power to identify new associations. For comparisons with EoE, atopic controls were randomly selected from the UKB dataset to equally represent atopic dermatitis, asthma, and allergic rhinitis (project 47377). External controls were from the University of Michigan Health and Retirement System (obtained from dbGAP accession phs000428.v1.p1), FECD-CIDR study (phs000421.v1.p1), KIDRISK study (phs001271.v1.p1), Framingham SNP Health Association Resource (phs000342.v21.p14), and Cincinnati Genomic Control Cohort^86^. Participants in the UKB dataset were stratified by atopic status using the mapped international classification of diseases (ICD) diagnostic codes (Table S6)^87^.
Genome-wide genotyping
Genotyping was performed on both the Illumina OMNI-2.5 genotyping array (n = 480) as previously described^19^ and the Infinium GSA (n = 724) according to the manufacturer’s protocol by the Regeneron Genetics Center^88,89^.
Genotyping quality control
Quality control for autosomal variants was performed in a stepwise process using a pipeline adapted from Marees et. al^90^. Strand alignment was performed using the McCarthy Group Imputation Preparation and Checking Tool (https://www.well.ox.ac.uk/~wrayner/tools/). Genomic inflation and other relevant data for each dataset are summarized in Table S5.
Relatedness and population stratification
Related individuals (2nd degree or closer) were excluded using GRAF-rel^91^. This study was performed based on the availability of sufficient numbers of consented subjects of one ancestral group (European) to perform an informative analysis. Individuals with non-European ancestry were excluded using GRAF-pop^92^
Genome-wide genotype imputation
Genotype imputation was performed using the TOPMed Imputation Server running Minimac4 1.5.7 with TOPMed r2 mixed ancestry reference panel^93^. Imputed data were retained for variants with minor allele frequency exceeding 1% and variance ratio (r^2^) exceeding 0.8. Imputed data were mapped to SNPs using dbSNP b151.
Genetic association analyses
Association analyses were performed in PLINKv2.0.
For all analyses, a lead genetic variant was defined as the top putatively independent association in an LD block (r^2^ < 0.1, based on the UK Biobank release 2b reference panel), whereas a risk locus was defined as the cluster of ≥ 1 lead variants within 250 kb. A novel risk locus was defined as containing ≥ 1 lead variant > 250 kb distant and linkage independent (r^2^ < 0.1) from previously published EoE risk variants in the GWAS catalog^94^. For risk loci with multiple lead variants in the single-trait analysis of EoE, independent association was verified by conditional logistic regression analysis. Independent significant variants were defined by FUMA^95^ and were used for LD expansion and candidate mapping.
Cross-platform analyses
Cases and controls were genotyped on different platforms in the association studies EoE_2_, EoE_T_ (which contained EoE_2_), EoE vs. atopic controls, and EoE vs. nonatopic controls (Table S5). Potential batch effects were addressed as follows.
To identify and account for potential batch effects in EoE_2_ (EoE on the GSA, controls on Omni2.5/Omni5), variant calls were compared for 634 samples that were genotyped on both platforms. Two million nine-hundred thousand imputed variants (36%) that were not strictly concordant (p < 0.9) were excluded from most analyses, including tables and Manhattan plots, resulting in a genomic inflation (λ) factor of 1.06. The threshold was relaxed to p < 0.4 for Zoom plots to increase coverage in small windows where platform biases would not be expected to drive discovery of associated variants.
For EoE_T_, in the absence of variant data in EoE_2_, EoE_1_ variant data were used instead (reducing the sample size from 1,757 EoE to 1,033 EoE). For Manhattan plots, variants missing in EoE_2_ were excluded from analysis to avoid large inconsistencies in statistical power that would incorrectly imply differences in effect size. The final genomic inflation was 1.05 with discordant variants excluded.
For EoE vs. atopic and EoE vs. nonatopic controls (EoE on Omni2.5 and GSA, controls on UKB Axiom), 36,000 imputed variants were excluded that showed low concordance (p < 0.001) between control populations genotyped on Omni2.5 and UKB Axiom. Due to the high degree of population stratification observed between controls in the UKB study and controls in the other cohorts, the European ancestry cutoff in GRAF-pop was tightened to P_e_ > 0.99 from the standard threshold of P_e_ > 0.87, excluding ~ 60% of UKB participants that would typically be classified as being of European ancestry. Batch effects between GSA and Omni were handled as described with EoE_T_. The final genomic inflation was 1.13 for EoE vs. atopic and 1.11 for EoE vs. nonatopic controls. Associations within the UKB study, such as those that constituted the EoE-atopy-MTAG multi-trait analysis, did not compare cases and controls on different platforms and therefore did not require these adjustments.
Multi-trait analysis of EoE and other atopic diseases
We performed multi-trait analysis of GWAS summary statistics using MTAG (v.1.0.8)^52^ to increase power for discovery of genetic loci associated with EoE. MTAG jointly analyzes multiple sets of GWAS summary statistics of genetically correlated traits to enhance statistical power^52^. Regression coefficients (beta) and their standard errors were used as inputs for MTAG. The MTAG program aligned all alleles on the basis of different summary statistics and ensured that single-nucleotide polymorphisms (SNPs) were present in all datasets. SNPs that were not present in any dataset were removed. The final count of SNPs for MTAG analyses was 6,078,836.
Candidate gene mapping
Independent significant variants from association analyses were mapped to genes using three methods: eQTL (esophageal mucosa in GTEx v8^96^, immune cells in the eQTL catalogue^97^, and DICE^98^), chromatin interaction (based on FANTOM5 project chromatin looping data in esophageal and immune cells^99^), and position (within 10 kb of the transcribed region). Mapping was performed using the FUMA GWAS tool v1.5.0 with default parameters unless otherwise specified^95^. The HLA region was excluded from mapping due to its complex LD structure.
RNA sequencing
mRNA was isolated from the distal esophagus from patients with EoE with active disease and from non-EoE controls as previously described^19,100,101^. EoE biopsies showed active disease pathology at the time when they were taken, and all patients had no glucocorticoid treatment at the time of biopsy. RNAseq acquiring 10 million mappable 75 base-pair reads from paired-end libraries was performed at the Genomics Sequencing Facility at CCHMC. Data were aligned using Ensembl^102^ annotations as a guide for TopHat^103^ using the default parameter settings. Expression analysis was performed by DESeq2 in BioWardrobe using the default parameter settings^104,105^.
Polygenic risk scores
PRS were calculated using PRS–continuous shrinkage (CS), a Bayesian polygenic prediction method that infers posterior effect sizes of genetic variants using GWAS summary statistics in the context of LD between variants as assessed on an external reference panel (i.e., the Phase 3 release of the 1,000 Genomes data)^106^. The PRS training process used the Discovery GWAS summary statistics (MTAG) and the individual-level genotype data of the training GWAS data from a prior study^19^ to tune the hyper-parameters of the prediction model using CS (auto mode) so that the pipeline automatically learned the sparseness of the genetic architecture from data and adjusted for the LD structure accordingly^106^.
Confounding effects due to population stratification were adjusted using a linear regression model with the 10 principal components of ancestry in all participants^107^. After calculating a principal component–adjusted PRS, age and sex were used as covariates in a logistic regression fitting model implemented in R version 4.1.0.
PRS prediction accuracy and performance were assessed using OR per 1 standard deviation in logistic regression after accounting for covariates (10 principal components, age, and sex). To measure the clinical utility of EoE-PRS, we report the OR top decile of this distribution as a high polygenic score and report the increased odds of EoE by comparing the top decile to the bottom decile.
Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Furuta GT, Katzka DA (2015) Eosinophilic Esophagitis. N Engl J Med 373:1640–164826488694 10.1056/NEJ Mra 1502863 PMC 4905697 · doi ↗ · pubmed ↗
- 2Abonia JP, Rothenberg ME (2012) Eosinophilic esophagitis: rapidly advancing insights. Annu Rev Med 63, 421 – 3422034864 10.1146/annurev-med-041610-134138 · doi ↗ · pubmed ↗
- 3Davis BP, Rothenberg ME (2016) Mechanisms of Disease of Eosinophilic Esophagitis. Annu Rev Pathol 11:365–39326925500 10.1146/annurev-pathol-012615-044241 PMC 4918086 · doi ↗ · pubmed ↗
- 4Simon D (2016) Eosinophilic esophagitis is characterized by a non-Ig E-mediated food hypersensitivity. Allergy 71, 611 – 2026799684 10.1111/all.12846 · doi ↗ · pubmed ↗
- 5Warners MJ, Vlieg-Boerstra BJ, Bredenoord AJ (2015) Elimination and elemental diet therapy in eosinophilic oesophagitis. Best Pract Res Clin Gastroenterol 29:793–80326552778 10.1016/j.bpg.2015.06.013 · doi ↗ · pubmed ↗
- 6Thel HL, Anderson C, Xue AZ, Jensen ET, Dellon ES (2024) Prevalence and costs of eosinophilic esophagitis in the United States. Clin Gastroenterol Hepatol 10.1016/j.cgh.2024.09.031PMC 1176139039486752 · doi ↗ · pubmed ↗
- 7Hirano I (2020) Efficacy of Dupilumab in a Phase 2 Randomized Trial of Adults With Active Eosinophilic Esophagitis. Gastroenterology 158:111 – 122e 1031593702 10.1053/j.gastro.2019.09.042 · doi ↗ · pubmed ↗
- 8Zuo L (2010) IL-13 induces esophageal remodeling and gene expression by an eosinophil-independent, IL-13R alpha 2-inhibited pathway. J Immunol 185:660–66920543112 10.4049/jimmunol.1000471 PMC 3746758 · doi ↗ · pubmed ↗
