Full-length gene polymorphism of the non-classical HLA-E in Estonian individuals
Timo I. Olieslagers, Ingrid Tagen, Mathijs Groeneweg, Marcel G. J. Tilanus, Lotte Wieten, Christina E. M. Voorter

TL;DR
This study explores HLA-E gene variation in Estonia and finds it similar to global populations, with some unique alleles.
Contribution
Identification of four novel HLA-E alleles and insights into HLA-E polymorphism in the Estonian population.
Findings
16 different HLA-E alleles were identified in Estonian individuals, including four novel alleles.
HLA-E polymorphism at amino acid position 107 showed frequencies comparable to other populations.
No allele frequency differences were found between South-East and other regions of Estonia.
Abstract
Estonia is a small country in the Baltic region of Northern Europe with 1.3 million inhabitants. As a coastal area, the population of Estonia was subjected to migration influences. Due to this admixture of populations, HLA gene diversity in Estonia is interesting to study with regard to allele frequencies, haplotypes, and polymorphism. In this study, we focused on HLA-E polymorphism within the Estonian population and compared these with the polymorphism identified in other populations. Full-length HLA-E sequencing of 143 individuals originating from Estonia show dimorphism frequencies at amino acid position 107 (0.55 R vs 0.45 G) comparable to other populations. Within the study population, 16 different HLA-E alleles were identified, including four novel alleles. These 16 alleles encode four different protein variants. Despite a strong differentiation between the South-East and the rest…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsT-cell and B-cell Immunology · Immune Cell Function and Interaction · Immunotherapy and Immune Responses
Introduction
Human Leucocyte Antigen-E (HLA-E) belongs to the group of non-classical MHC class I molecules (class Ib) that play an essential role in immunomodulation. The expression of HLA-E shows a wide tissue distribution; healthy nucleated cells positive for class I also express HLA-E, albeit to a lower level than the classical class I molecules (Wieten et al. 2014). The structure of the HLA-E molecule resembles that of classical HLA class I molecules with three extracellular domains pairing with the β2 microglobulin (O’Callaghan et al. 1998; Koller et al. 1988). HLA-E primarily binds self-peptides derived from the leader sequences of classical class I molecules, thereby monitoring their expression (Sullivan et al. 2006; O’Callaghan et al. 1998). HLA-E has a dual function, playing a role in both the innate and adaptive immune response (Sullivan et al. 2008; van Hall et al. 2010; Wieten et al. 2014; Kraemer et al. 2014). On the one hand, it serves as a ligand for both activating and inhibitory CD94/NKG2 receptors present on NK cells and on a subset of T cells. On the other hand, it interacts with both cytotoxic and regulatory CD8 T cells via their αβ T cell receptor (Wieten et al. 2014; Joosten et al. 2016; Grant et al. 2020).
The HLA-E gene is located on the short arm of chromosome 6 between the HLA-A and –C genes. The HLA genes are among the most polymorphic genes in the human genome, but HLA-E is more conserved than classical HLA class I genes and has only a limited level of allelic variation. At present, 378 different HLA-E alleles have been assigned, but due to multiple synonymous substitutions and a non-expressed (null) allele, they encode only 142 protein variants (IPD-IMGT/HLA Database v3.60) (Barker et al. 2023). For a long time, only two HLA-E protein variants were known that differ by a single amino acid (Arg/Gly) at codon 107 of the α2 domain: HLA-E01:01* and 01:03. This is attributed to the predominant focus of studies on sequencing exons 2 and 3 of the gene, which constitute the region encoding the peptide-binding groove of the protein. The rapid development of the sequencing technology and the availability of samples from different populations resulted in the discovery of a more extended variability and the identification of more HLA-E protein variants, including the variants that we and others identified in previous studies examining the full-length HLA-E gene polymorphism in individuals from different populations (Olieslagers et al. 2017; Castelli et al. 2015; Felicio et al. 2014; Lucas et al. 2020). Recently, two large studies have confirmed this higher variability: Sauter et al. by analyzing the NGS HLA-E genotyping data of 2.5 million potential stem cell donors originating from 104 different populations (Sauter et al. 2021), and Lucas et al. by full-gene SMRT sequencing of HLA-E using 6227 DNA samples (Lucas et al. 2023). All studies clearly show that HLA-E is much more polymorphic than previously thought, but that the HLA-E01:01 and 01:03 protein variants are by far the most common variants (~ 99%) with almost equal frequencies worldwide. This has already led to the suggestion that there is some balancing selection implying functional differences between these two protein variants (Olieslagers et al. 2017). A higher cell surface expression level for HLA-E01:03 than *01:01 has been reported, due to its slightly higher peptide-binding affinity (Ulbrecht et al. 1999; Celik et al. 2016; Maier et al. 2000; Strong et al. 2003).
Although population studies on HLA-E typing are increasing, HLA-E variation has only been studied once in the Estonian population (Sauter et al. 2021), a relatively small Northern European country with around 1.3 million inhabitants. The Estonian population is known for its unique genetic profile, shaped by historical migrations, geographic isolation, and cultural influences, which have resulted in marked genetic structuring across the country, particularly between the South-East and other regions (Nelis et al. 2009; Pankratov et al. 2020). These historical and geographic factors have significantly influenced the general genetic diversity of Estonians. However, it remains unclear whether such structuring also affects HLA-E diversity within this population.
In this study, we aimed to comprehensively characterize the genetic variation of the HLA-E gene within the Estonian population by conducting full-length sequencing of samples from 143 individuals of Estonian origin, representing diverse regions of the country. By focusing on the Estonian population, we also explored potential regional differences in HLA-E variation and examined the association between HLA-E alleles and classical HLA-A, -B, and -C alleles. By focusing on this genetically distinctive population, our research contributes to the broader understanding of HLA-E diversity within populations and its implications for immune function.
Materials and methods
For this study, we selected 143 DNA samples from the 1089 samples provided by the Estonian Biobank, which consists of 10,317 samples from individuals born in Estonia, collected in 2005 (Nelis et al. 2009). To ensure equal representation from each of the 15 Estonian counties, an equal number of individuals was selected per county, with comparable age distribution and gender between the counties. In total, the individuals age at the time of inclusion in the Biobank varied from 18 to 83, whereas 72 individuals were male and 71 female. The counties were grouped into different regions: South-East (Põlva, Tartu, Valga and Võru), North-East (Jõgeva, Ida-Viru and Lääne-Viru), and a group referred to as other than South-East (Harju, Hiiu, Ida-Viru, Jõgeva, Järva, Lääne, Lääne-Viru, Pärnu, Rapla, Saare and Viljandi) to compare regional differences within Estonia. The study was approved by the Ethics Review Committee on Human Research of the University of Tartu (216/T-10, 28.06.2012). Written informed consent for participation was obtained from all study subjects.
The complete HLA-E gene was amplified from 5’UTR to 3’UTR as previously described (Olieslagers et al. 2017). Amplicons were purified by ExoSAP-IT (Affymetrix, Santa Clara, California) according to the manufacturer’s protocol. Sanger sequencing was performed with both forward and reverse sequencing primers as described previously (Olieslagers et al. 2017).
Allele frequencies were obtained by direct calculation using the formula n/2N, where n is the number of observed alleles and N is the number of individuals analyzed. The allele frequencies (HLA-E*01:01:01G, *01:03:01G and *01:03:02G) of different regions were compared using the chi-squared test. The “G” suffix indicates that these alleles belong to the same G group, meaning they share identical amino acid sequences across the exons encoding the peptide-binding domains.
To determine associations between HLA-E alleles and the classical class I HLA-A, -B and –C alleles, low resolution typing of these loci was performed by Luminex SSO according to the manufacturer’s protocol using commercial kits (One Lambda, LABType SSO). For analysis of the data, PYPOP 0.7.0 was used (Lancaster et al. 2007). Of the 143 individuals, complete HLA-A, -B, and -C typing results were obtained for 128. These 128 samples were used for the association analysis. A standard chi-squared test, implemented in the PYPOP software package, was used to assess that data fitted into Hardy–Weinberg equilibrium.
Results
HLA-E allele and phenotype frequencies
The nucleotide polymorphism of HLA-E was studied in 143 Estonian individuals by full-length sequencing, covering the 5’UTR to the 3’UTR. Within this population, 4 alleles were identified that were unknown in the IPD-IMGT/HLA database at that time (HLA-E01:01:01:51, 01:01:43, 01:01:01:53, and 01:01:01:54). The numbers and frequencies of HLA-E alleles found in this study are indicated in Table 1. The alleles HLA-E01:01:01:01/01:01:01:02, 01:06:01:01/01:06:01:02, and 01:09:01:01/01:09:01:02 were treated as a single group, as the amplicon sequence of the 5′ UTR region did not encompass positions − 181 to − 200, where the differences between these alleles are located. The HLA-E allelic distribution was in Hardy–Weinberg equilibrium with a P-value of 0.78 in the Estonian individuals. The allele HLA-E01:01:01:01/02 (f = 0.441) was the most frequently observed, followed by 01:03:02:01 (f = 0.357). Table 1HLA-E allele numbers and frequencies in the 143 Estonian individualsHLA-E allelen = Frequency01:01:01:01/021260.44101:01:01:03190.06601:01:01:0530.01001:01:01:1810.00301:01:01:5110.00301:01:01:5310.00301:01:01:5410.00301:01:4310.00301:03:01:0190.03101:03:02:011020.35701:03:02:0250.01701:03:05:0130.01001:03:0610.00301:03:3810.00301:06:01:01/02100.03501:09:01:01/0220.007
Within the study population four different HLA-E protein variants were found, HLA-E01:01, 01:03, 01:06, and 01:09. Calculating the phenotype frequencies (Table 2) showed that 45% of the individuals are heterozygous for HLA-E01:01 and 01:03. Comparing the protein sequences of these four HLA-E proteins reveals two distinct peptide-binding groove sequences: the groove of HLA-E01:09 is identical to 01:01, with a difference located in the alpha 3 domain. Similarly, the groove of HLA-E01:06 is identical to 01:03, with a difference located in the alpha 3 domain. Table 2HLA-E phenotype frequencies in the 143 Estonian individualsHLA-E phenotypesn = Percentage01:0101:014229%01:0101:036545%01:0301:032417%01:0101:0643%01:0301:0664%01:0301:0921%
A previous study has shown that the Estonian population is genetically structured, with a strong distribution between South-East and North-East or South-East and other than South-East (Pankratov et al. 2020). Since our study population was spread over the 15 Estonian counties, we determined whether there were HLA-E allele frequency differences based on the high-resolution level (HLA*-E01:01:01G, 01:03:01G and 01:03:02G) between these regions. Within our study population, the chi-square values for the comparisons between the South-Eastern (n = 40) and North-Eastern counties (n = 27), and the South-Eastern and other than South-Eastern counties (n = 103), were 1.06 (p = 0.59) and 3.11 (p = 0.21), respectively, showing that no significant differences were observed between the allele frequencies of these regions (Table 3). Table 3HLA-E allele frequencies across different regions of EstoniaRegion01:01:01G01:03:01G*01:03:02GSouth-East0.500.080.43North-East0.610.110.28Other than South-East0.560.080.36
The present-day Estonian population has been shaped by various migration waves (Kivisild et al. 2021). To explore potential influences on HLA-E diversity, the allele frequencies observed in this study were compared with those reported for Estonians (n = 156) and neighboring populations, including Finns (n = 337), Russians (n = 14,896), Latvians (n = 301), Belarusians (n = 359), and Ukrainians (n = 2220), as described by Sauter et al. (2021). This study is based on DKMS registry donors in Germany, with ethnicity determined through self-assessment at recruitment. Additionally, comparisons were made with three Asian populations, Japanese (n = 245), Indonesian (n = 256), and Chinese (n = 980), as they are genetically distinct from European populations. Based on the high-resolution HLA-E alleles (01:01:01G, 01:03:01G, and 01:03:02G), no major differences were observed between the Estonian samples from this study and the Estonian population reported by Sauter et al., nor between the Estonian and neighboring European populations (Table 4). As expected, the Estonian population showed clear differences compared to the Asian populations. Table 4HLA-E allele frequencies in studied Estonian individuals compared to the Estonian, neighboring and some Asian populations found by Sauter et al (2021)HLA-E alleles01:01:01G01:03:01G01:03:02GComparisonEstonia — this study (n = 143)0.540.080.38Estonia — Sauter et al. (n = 156)0.530.100.37p = 0.6950Finland — Sauter et al. (n = 337)0.490.160.34p = 0.0036Russia — Sauter et al. (n = 14,896)0.560.110.32p = 0.0601Latvia — Sauter et al. (n = 301)0.550.080.37p = 0.9572Belarus — Sauter et al. (n = 359)0.550.110.33p = 0.1918Ukraine — Sauter et al. (n = 2220)0.570.130.30p = 0.0034Japanese — Sauter et al. (n = 245)0.490.270.24p < 0.0001Indonesian — Sauter et al. (n = 256)0.440.320.24p < 0.0001Chinese — Sauter et al. (n = 980)0.420.280.30p < 0.0001*Chi^2^ calculated difference between Estonian population (this study) and the other populations
Given the close genetic relationship between the Estonian and Finnish populations, rooted in their shared historical and geographic ties (Kivisild et al. 2021), we further compared HLA-E variation between Estonians and Finns by analyzing SNP frequencies within the HLA-E gene among Estonian subjects from this study and Finnish individuals (n = 99) from the 1KG project (Table 5). This comparison utilized data from the publicly available third phase of the 1KG project (https://www.internationalgenome.org/data-portal/data-collection/phase-3), which provides whole-genome sequence data from 2504 individuals across 26 populations (Genomes Project et al. 2015). Overall, the SNP frequencies were similar across the two populations. However, there were some differences: at position 424, the T nucleotide (specific for HLA-E01:03:02G) was more prevalent in the Estonian individuals compared to the Finnish (f = 0.378 vs 0.298), while at position 1857, the T nucleotide (specific for HLA-E01:06) was less common in studied Estonian individuals compared to the Finnish population (f = 0.03 vs 0.14). When comparing the Estonian individuals and Finnish population, all evaluated SNPs present in the Finnish population were also found in the studied Estonian individuals, but not vice versa. Overall, no major differences were detected between the SNP frequencies of the populations examined. Table 5. Comparison of SNP frequencies between studied Estonian individuals and the Finnish populationLocation5′UTRE2E2E3I3E4E4E4E4E5I5I6I63′UTR3′UTRgDNA position − 1044094247561014162518221857185920762706292429373224–32513500major alleleAGCATGACCCCCGA–#–CTminor alleleGATGACCTTTTTAdelCMAF Estonia0.0100.0030.3780.4580.0170.0100.0070.0350.0030.0030.0030.0700.0030.0030.003MAF Finnish000.2980.4750.035000.1360000.086000SNP frequencies from the Estonian individuals were calculated from our study, while SNP frequencies from the Finnish population were taken from the 1KG project. The minor allele frequency (MAF) is indicated
Haplotypes/associations
In this study, we utilized the Pypop analysis tool to investigate the associations between HLA-E polymorphism R107G and other HLA class I alleles in the Estonian individuals. Although the R and G are present in almost equal numbers in our study, they are not equally distributed over the different HLA-A and HLA-C alleles (Table 6 (A and B)). HLA-A01* was more frequently found with R than G (R = 12% vs G = 1%), HLA-A03* was more frequently found with G than R (R = 2% vs G = 15%) and HLA-C04* was more frequently found with G than R (G = 10% vs R = 0%). Also, the haplotypes A ~ B ~ C show a distinct association with either R or G, except for the haplotypes A02* ~ B27* ~ C02* and A02* ~ B07* ~ C07*, which show an approximately equal distribution for R and G (Table 6 (C)). Furthermore, the HLA-B dimorphism in the leader peptide (− 21 methionine (M) or − 21 threonine (T)) showed no apparent pattern with R or G, suggesting no clear association.
Table 6. Association of HLA-A alleles, HLA-C alleles, HLA-ABC haplotypes, and HLA-B leader peptide dimorphism with amino acid position 107 of HLA-EAHLA-AR107G107n =HFn =HF0130.60.1191.40.0060252.30.20450.70.198036.10.02437.90.148115.90.0234.10.01623N/A02.00.0082412.10.0478.90.035259.00.035N/A0262.40.0094.60.018303.70.0151.30.005314.00.016N/A0328.00.031N/A033N/A01.00.004683.90.0156.10.024BHLA-CR107G107n = HFn =** HF016.00.0239.00.0350211.20.04419.80.0770314.90.05824.10.09404N/A026.00.1020511.00.0432.00.0080623.10.0905.90.0230748.00.18826.00.101083.00.012N/A0128.40.0331.60.006145.00.020N/A0156.30.0252.70.01116N/A01.00.004171.00.004N/A0CHLA-AHLA-BHLA-C**HLA-B
**leader alleleR107G107**n =HFn =***HF010807M17.90.0701.10.004030707T1.40.00612.70.050022702T6.20.0246.70.026033504MN/A011.80.046020707T3.80.0156.40.025024405T8.80.0341.20.005021306M9.70.038N/A0024003TN/A08.50.033021503TN/A05.90.023031503TN/A06.00.023Numbers (n) and haplotype frequencies (HF) for *HLA-A *alleles (pairwise LD=0.025, p < 0.000) (A), *HLA-C *alleles (pairwise LD=0.026, p < 0.000) (B), the 10 most frequent HLA-ABC haplotypes (C) and the HLA-B dimorphism in the leader peptide (C) associated with either R or G at amino acid position 107 of HLA-E
Discussion
In this study, we performed full-length sequencing of the HLA-E gene in 143 Estonian individuals to identify all possible SNPs in both coding and non-coding regions, including the 5’ and 3’ UTRs, and identified a total of four novel alleles. The number of new alleles found in this small population is comparable to other studies that investigated full-length HLA-E polymorphism in specific populations (Castelli et al. 2015; Felicio et al. 2014; Olieslagers et al. 2017; Ramalho et al. 2017).
The genetic makeup of the Estonian population is shaped by a combination of factors, including historical migrations, genetic admixture, and geographic location, resulting in a complex blend that reflects both historical and contemporary influences (Pankratov et al. 2020; Kivisild et al. 2021). However, the results of our comparison of HLA-E genotype frequencies and individual HLA-E SNP frequencies with those of other populations indicate that these factors did not affect HLA-E variation. Our study utilized samples from individuals across different counties of Estonia as representatives of the whole Estonian population, allowing us to compare our findings with those of other populations. We found that HLA-E variation within Estonia was conserved across different regions and remained largely consistent when compared with their neighboring European populations. This suggests that the HLA-E gene serves a conserved function that is preserved across diverse genetic backgrounds, reflecting its crucial role in immune modulation.
The controversy between the low polymorphism of HLA-E opposite to the high polymorphism of the classical HLA class I genes, HLA-A, -B, and –C, has been the subject of speculation about the function and the evolutionary pathway of HLA-E (Geraghty et al. 1992; Grant et al. 2020). To get more insight in the latter one, we have evaluated the association of the classical class I HLA alleles with the dual polymorphism R107G of HLA-E. As can be deduced from Table 6, for some of the HLA-A or –C allele groups, there seems to be a preferential association with either R or G at amino acid position 107, like HLA-A01* (R12% vs G1%) and HLA-A03* (R2% vs G15%). In several of them, the association is rather exclusive, like HLA-A25* (R4% vs G0%), 32 (R3% vs G0%), and HLA-C04 (R0% vs G10%), although the small sample sizes could influence these observations. This is in agreement with the findings of Liu et al. (2012), who reported in a study on HLA-E in 4 different Chinese populations that several common HLA-A alleles displayed dichotomy of association with HLA-E alleles in all four populations. On the other hand, there are also HLA-A allele groups that can be present with both R and G, like HLA-A02* and 24. In fact, Geraghty et al. already showed this phenomenon of restricted R107G associations with some of the HLA-A allele groups, like R with 01, but equal R/G distributions for other HLA-A alleles, like 02 (Geraghty et al. 1992). Although Geraghty found a restricted association of HLA-A24 with R, Carlini et al. found equal distribution of HLA-A24 with R and G, comparable to our results, in a study on linkage disequilibrium between HLA-A and non-classical HLA genes in 191 voluntary blood donors from south eastern France (Carlini et al. 2016). Since HLA-A01 and *03 have identical peptide sequences from the leader peptide that could bind to HLA-E (VMAPRTLLL), this does not seem to be the trigger whether R or G is present on that haplotype. Taking the total A ~ B ~ C haplotype into account (see Table 6) an even more distinct association can be observed. Eight of the ten most frequent haplotypes show a clear preference for either R or G.
In conclusion, this study explored the polymorphism of the non-classical HLA-E gene within Estonian individuals, revealing several new SNPs and shedding light on its association with classical HLA class I genes. While the Estonian population’s genetic makeup reflects a blend of historical migrations and contemporary influences, HLA-E polymorphism appears to be conserved. The findings also highlighted preferential associations between specific classical HLA-A and -C allele groups and HLA-E polymorphism. These insights contribute to a clearer understanding of the conserved nature of HLA-E variation and its stable presence across diverse populations, irrespective of historical or regional genetic influences.
