Genome-wide association analyses of autoimmune hypothyroidism reveal autoimmune and thyroid-specific contributions and an inverse relationship with cancer risk
Mary Pat Reeve, Masahiro Kanai, Daniel B. Graham, Juha Karjalainen, Shuang Luo, Nikita Kolosov, Cameron Adams, Jarmo Ritari, Konrad J. Karczewski, Tuomo Kiiskinen, Yu Jiang, Zachary Fuller, Juha Mehtonen, Mitja I. Kurki, Zia Khan, Mary Pat Reeve, Mary Pat Reeve, Juha Karjalainen

TL;DR
This study identifies 418 genetic signals linked to autoimmune hypothyroidism and classifies them into general autoimmunity and thyroid-specific categories, revealing a protective effect against skin cancer.
Contribution
The study introduces a Bayesian classification method to distinguish systemic autoimmunity and thyroid-specific loci in autoimmune hypothyroidism.
Findings
418 independent genetic signals were identified for autoimmune hypothyroidism.
38% of loci are involved in general autoimmunity, while 20% are thyroid-specific.
10% of AIHT loci show a protective effect against skin cancer.
Abstract
The high prevalence (>5%) of autoimmune hypothyroidism (AIHT) provides a unique opportunity to dissect genetic contributions to systemic and organ-specific autoimmunity. Here we performed a genome-wide association meta-analysis of 81,718 AIHT cases in FinnGen and the UK Biobank, identifying 418 independent signals (P < 5 × 10−8). At 48 of these loci, a protein-coding variant is, or is highly correlated (r2 > 0.95) with, the lead variant, including Finnish-enriched coding variants in LAG3, ZAP70 and TG. We demonstrated that ZAP70:T155M reduces T cell activation and broadly compare large-scale scans of nonthyroid autoimmunity and thyroid-stimulating hormone levels with a Bayesian classifier to assign loci into distinct groupings, estimating that 38% are involved in general autoimmunity whereas 20% are thyroid specific. We further identified substantial antagonistic pleiotropy, with 10% of…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2- —FinnGen
- —https://doi.org/10.13039/100000002U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- —Aging Biology Foundation
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsThyroid Disorders and Treatments · Diabetes and associated disorders · Genetic Associations and Epidemiology
Main
Hypothyroidism is estimated to affect at least 5% of individuals, although underdiagnosis suggests that this may be a substantial underestimate^1,2^. In areas of the world with iodine sufficiency, the most common cause of hypothyroidism is Hashimoto’s disease, an autoimmune attack on the thyroid leading to reduced thyroid hormone production. As such, it constitutes the most common autoimmune disease^3^, although at the same time it is considered vastly underdiagnosed considering the nonspecific and gradual onset of the many clinical symptoms that it causes. Early detection is key because treatment and supplementation with levothyroxine (synthetic thyroid hormone (T_4_)) can largely alleviate symptoms and prevent longer-term complications.
Given the well-established sharing of genetic risk factors across autoimmune diseases, a genetic study of hypothyroidism at scale would be expected to provide general insights into autoimmunity, as well as specific insights into thyroid disease, which might aid in early detection and effective treatment before substantial thyroid damage has taken place.
Population-wide biobank resources enable large-scale integration of clinical information across medical domains. Here we utilize FinnGen’s integration of genome information with lifelong medical history in 10% of the Finnish population to detect all treated cases of hypothyroidism (removing major non-Hashimoto’s causes of hypothyroidism) and perform a genome-wide association study (GWAS) of AIHT. Repeating the same definitions in the UK Biobank, we present here a genome-wide analysis of >80,000 AIHT cases identifying 418 independent associations, far beyond recent studies^4–7^.
Results
Phenotypic definitions from registry data
To pursue a GWAS of AIHT first required phenotypic definitions that capture a large but specific set of AIHT. Unlike most autoimmune diseases diagnosed in specialty clinics with well-recorded data, many AIHT cases are detected in primary care and recognized through continuous levothyroxine use. Use of such data required careful removal of individuals who were hypothyroid due to thyroid ablation from Graves’ disease, thyroid cancer, thyroidectomy and congenital thyroid disease (Methods). Removal of nearly 10,000 such individuals from the broadest definition resulted in 54,752 cases of AIHT in FinnGen R12, yielding 231 genome-wide significant loci. Confirming our hypothesis that phenotypic restrictions would create a more homogeneous genetic phenotype, the larger, less-specific GWAS of 64,082 individuals treated for hypothyroidism contained substantially fewer (204 versus 231) genome-wide significant associations. Of note, we hypothesized that, although Graves’ disease and Hashimoto’s disease likely share autoimmune components, there might also be variants with opposite effects on thyroid function. Thus, we intentionally focused the primary scan on AIHT and describ the alignment of the results with the analogous meta-analysis of Graves’ disease.
Meta-analysis of autoimmune hypothyroidism
Having optimized the phenotypic definition in FinnGen, we implemented the analogous phenotype in UK Biobank (Methods), identified 26,966 cases and ran a standard inverse-variance weighted, fixed-effects meta-analysis. Using a strict linkage disequilibrium (LD)-based definition of independence (Methods), this meta-analysis (81,718 cases, 732,951 controls) produced a total of 417 independent genetic associations outside the major histocompatibility complex (MHC; Supplementary Table 1) and numerous highly significant MHC associations centered at previously reported common variants spanning the DRB1–DQA1–DQB1 locus^5^. Even conservatively ascribing associations within 1 Mb to the same ‘locus’ indicates at least 280 distinct genomic regions associated with AIHT. Both increased sample size and improved phenotype specificity led to improved power in this study compared to prior studies (Supplementary Table 10). Replication in the recently released MVP dataset^8^ (Supplementary Table 1) shows significant replication (one-sided P < 0.05) and directional consistency at 89% and 98% of loci, respectively.
Overview of association signals
At 48 of 417 associations, the lead variant is itself, or in very high LD (squared correlation coefficient, r^2^ > 0.95) with, a protein-coding variant (Table 1). Among these are well-established common variants (for example, PTPN22, SH2B3 and FUT2), as well as lower-frequency hypomorphic alleles (for example, TYK2 and IFIH1) associated with many autoimmune diseases. Of the associations, 16 are low-frequency variants highly enriched in Finns (from 4-fold to >100-fold; Table 2), 12 of which are found only in the FinnGen GWAS because of their low frequency in the UK Biobank (UKBB). Notably, 6 of these 12 map on to coding variants noted above.Table 1. Coding variants implicated by AIHT genome-wide association meta-analysesLead variantP valueAllele frequency (FIN)r^2^ codingAnnotation12:111446804:T:C9.07 × 10^−234^0.5951.00SH2B3:W262R2:162267541:C:T2.09× 10^−^^32^0.5851.00IFIH1:A946T16:50301163:C:A4.66 × 10^−^^30^0.008721.00ADCY7:D439E2:162268127:T:C6.91 × 10^−^^28^0.01941.00IFIH1:I923V19:10352442:G:C1.17 × 10^−^^24^0.03091.00TYK2:P1104A8:132887335:C:T2.01× 10^−^^21^0.001451.00TG:Q655X1:2562891:G:A6.82 × 10^−^^21^0.1211.00TNFRSF14:V241I6:166929653:G:A6.25 × 10^−^^16^0.05791.00RNASET2:R236W19:11416089:T:G9.88 × 10^−^^16^0.5761.00RGL3:H162P11:94179472:A:G3.41 × 10^−^^14^0.08591.00PANX1:I272V19:48703417:G:A1.07 × 10^−^^13^0.3741.00FUT2:W154X2:97658891:G:A4.22 × 10^−^^13^0.2341.00ACTR1B:A143V, ANKRD36:K5S_fsTer293:58197909:G:A1.70 × 10^−^^11^0.0611.00DNASE1L3:R206C19:3179519:C:T1.12 × 10^−^^10^0.05461.00S1PR4:R243C19:49665663:G:A2.77 × 10^−^^10^0.003131.00IRF3:A277T11:308290:T:C4.05 × 10^−^^10^0.4821.00IFITM2:V33A13:42574410:C:G9.51 × 10^−^^10^0.04131.00TNFSF11:P36A1:185182538:G:A1.03 × 10^−^^9^0.3451.00SWT1:H536R, I148V10:113588287:G:A1.27 × 10^−^^9^0.02721.00HABP2:G534E23:154018741:A:G1.60 × 10^−^^9^0.8531.00IRAK1:F196S, S532L12:6773332:C:A1.67 × 10^−^^9^0.000981.00LAG3:P67T19:21537632:CA:C5.24 × 10^−^^9^0.007241.00ZNF429:H527L_fsTer1577:127375029:G:A5.53 × 10^−^^9^0.04231.00ZNF800:P103S3:101852100:G:C5.67 × 10^−^^9^0.01461.00NFKBIZ:G102A7:44977336:G:A8.71 × 10^−^^9^0.3361.00MYO1G:V49M1:113834946:A:G<1 × 10^−^^300^0.8551.00PTPN22:W620R1:116769497:A:G8.58 × 10^−^^9^0.1881.00CD2:H266Q10:122389836:C:T1.95 × 10^−^^16^0.291.00PLEKHA1:T320A17:39787478:C:A1.61 × 10^−^^11^0.04231.00IKZF3:G234A, GSDMB:N250G2:97890066:T:C7.60 × 10^−^^24^0.01930.99ZAP70:T155M8:60473974:T:C2.30 × 10^−^^18^0.4050.99RAB2A:L68P^^4:105282863:A:C1.08 × 10^−^^8^0.340.99TET2:I1762V3:12209924:G:A8.30 × 10^−^^38^0.7710.99SYN2:T506A5:139474618:A:C3.11 × 10^−^^13^0.7020.99SMIM33:G88S22:31146151:A:G1.67 × 10^−^^9^0.7820.99PLA2G3:S70A22:37187692:G:GC1.32 × 10^−^^57^0.3990.98C1QTNF6:G21V23:79185670:A:C3.53 × 10^−^^26^0.6290.98GPR174:S162P16:67628912:T:A1.18 × 10^−^^13^0.0390.98AGRP:A67T, LRRC36:S744G R222P G509S, KCTD19:E750K17:46019187:G:A5.63 × 10^−^^10^0.0810.98SPPL2C:R461P P643R G620R S224P I471V S601P, MAPT:V364A P277L R455W, KANSL1:I1085T1:7884770:C:A1.87 × 10^−^^8^0.02690.98PER3:P415A H417R22:29579458:A:G8.35 × 10^−^^10^0.8040.97THOC5:V579I14:105759223:C:T1.37 × 10^−^^16^0.390.97IGHG3:P221L S314N Y226F, IGHG1:L97R D239E L241M1:156825994:A:G1.44 × 10^−^^24^0.6270.97SH2D2A:N52S2:55631052:T:G1.35 × 10^−^^9^0.03670.97CFAP36:I246F, PNPT1:N590D11:61128363:A:G9.58 × 10^−^^9^0.6120.96CD5:A471V18:69860373:A:G2.34 × 10^−^^20^0.5360.95CD226:S307G9:125354538:A:G6.30 × 10^−^^10^0.5080.95GAPVD1:V334L^^17:7826527:C:A2.64 × 10^−^^8^0.09340.95KDM6B:T761T762del19:40278113:C:T1.54 × 10^−^^15^0.01060.88ZNF780B:K284N17:42137394:T:C2.17 × 10^−^^31^0.2030.85HSPB9:Q2P11:64340005:T:C1.24 × 10^−^^13^0.3850.85CCDC88B:D193E6:108989516:T:G1.66 × 10^−^^15^0.8240.83SESN1:L103I8:132905343:G:A9.93 × 10^−^^29^0.650.81TG:D1312G4:186086720:A:G1.59 × 10^−^^35^0.3680.80TLR3:L412F6:368019:T:A6.42 × 10^−^^11^0.004310.78IRF4:P88Q22:41428193:A:AAAT6.01 × 10^−^^12^0.7510.78C22orf46 (noncoding transcript)5:157186630:T:C1.02 × 10^−^^11^0.1060.78FAM71B:M564T6:108941239:G:C2.55 × 10^−^^9^0.4260.78ARMC2 (splice acceptor)^^9:124267351:A:G9.95 × 10^−^^46^0.4020.77PSMB7:V39A1:1238231:G:A1.38 × 10^−^^9^0.1540.75C1QTNF12:C231R A180V12:56627222:T:C5.73 × 10^−^^14^0.3170.74NACA:V336E20:33085232:A:T1.27 × 10^−^^9^0.06520.74BPIFB1:T464S2:200872011:T:C2.17 × 10^−^^17^0.2360.74NIF3L1:T324I5:35837132:G:A1.51 × 10^−^^21^0.3960.74IL7R:T244I3:36979158:T:C1.56 × 10^−^^12^0.3710.72MLH1:I219V11:10509469:T:C1.04 × 10^−^^13^0.03130.72IRAG1:A79T22:38700597:T:C1.43 × 10^−^^13^0.3990.70KDELR3:V199GGenome coordinates for lead variants given in human genome assembly GRCh38 (hg38). P values are from the FinnGen–UKBB inverse-variance weighted meta-analysis (Methods) and are uncorrected for multiple testing. Allele frequency (FIN), nonreference allele frequency in the FinnGen cohort; r^2^ coding, r^2^ (calculated from the FinnGen imputation reference panel) between lead variant and first coding variant listed. All coding variant annotations (in Annotation column) use MANE select transcript as reported in gnomAD v4, except those marked with an asterisk (‘’), which pertain to an alternative transcript(s) only.Table 2. Finnish-enriched variants in AIHT genome-wide association meta-analysesLead variantP valueAllele frequency (FIN)Enrichment (FIN)βs.e.Coding association12:6773332:C:A1.67 × 10^−^^9^0.00098inf0.630.104LAG38:132887335:C:T2.01 × 10^−^^21^0.00145inf0.8290.0872TG23:155548829:G:A2.36 × 10^−^^8^0.0265120.2−0.1130.020219:40278113:C:T1.54 × 10^−^^15^0.010693.60.2520.0316ZNF780B1:51075821:C:CA4.74 × 10^−^^10^0.0568530.09010.014516:27384341:C:CT2.17 × 10^−^^34^0.058250.60.1730.01412:97890066:T:C7.60 × 10^−^^24^0.0193380.2370.0235ZAP703:152243543:C:T4.51 × 10^−^^8^0.021237.30.1270.023312:519821:G:C1.63 × 10^−^^8^0.035116.90.1040.01856:368019:T:A6.42 × 10^−^^11^0.0043113.3−0.3760.0575IRF42:100705638:A:G3.96 × 10^−^^9^0.020910.40.1350.02329:3857583:G:A5.68 × 10^−^^10^0.001269.40.5450.087912:56182198:T:G6.88 × 10^−^^12^0.005217.30.310.045219:3179519:C:T1.12 × 10^−^^10^0.05465.8−0.08880.0155S1PR4Index associations (lead variant) to variants with fivefold or greater frequency in Finns than in non-Finnish–Swedish–Estonian Europeans (derived from gnomAD v3). The ratio of allele frequencies is reported in the Enrichment (FIN) column. All variants except the two marked with asterisks were not present in Pan UKBB results owing to the extreme low frequency. Allele frequency (FIN), nonreference allele frequency in the FinnGen cohort; β and s.e., β and s.e. from FinnGen GWAS (Methods). The Coding association column denotes variants also appearing in Table 1 at which a coding variant is, or is in high LD with, the lead variant. Genome coordinates reported from human genome assembly GRCh38 (hg38). P values are from the FinnGen–UKBB, inverse-variance weighted meta-analysis (Methods) and are uncorrected for multiple testing.
Some 51 lead variants have a minor allele frequency (MAF) <5% in Finland, 16 of which (31.4%) are in the most likely ‘coding association’ (r^2^ > 0.95; Table 1) category, whereas only 32 of the 366 (8.7%) more common variants are—a significant (P < 0.0005) 3.5-fold excess reflecting both natural selection seen broadly across GWASs (higher effect alleles are kept lower in frequency, although not directionally with respect to disease) and function (higher effect alleles detected in frequency-agnostic GWAS analysis are more often coding than lower effect ones)^9,10^.
Lower frequency-associated coding variants provide direct clues to disease biology. Noteworthy findings include a missense variant in LAG3 (P67T), an inhibitory immune receptor with no prior reported genetic associations for which inhibitors have been recently approved as immunotherapy in advanced melanoma^11,12^. The most notable new Finnish association, however, is a noncoding variant (Chr. 16:27384341:C:CT) ~20 kb from the transcription start site of IL21R. In addition, rare missense variants in both IRF3 and IRF4 confer protection from AIHT (with effects exceeding protective hypomorphic missense variants at TYK2 and IFIH1), as does a low-frequency risk variant of NFKBIZ.
We also observed unexpected pleiotropy. A missense variant in ZNF800 (P103S) that increases hypothyroid risk also increases cataract risk in FinnGen and lowers alkaline phosphatase levels and bone mineral density in UKBB (all P < 1 × 10^−20^). Two PER3 variants in complete LD (Pro415Ala and His417Arg) that lower AIHT risk are associated with morning chronotype in UKBB^13^ and reported^14^ to segregate in a family with familial advanced sleep phase syndrome and demonstrated to reduce PER3 protein levels and repressor activity.
Hypomorphic mutation of ZAP70 drives autoimmunity and immune deficiency
Another new Finnish-enriched association is a missense variant in ZAP70 (Thr155Met), which encodes a tyrosine kinase essential for signal transduction downstream of the T cell receptor (TCR) in response to antigen recognition. ZAP70 mutations cause a severe autosomal recessive combined immunodeficiency^15–17^ marked by absence of CD8^+^ T cells and CD4^+^ T cells that do not respond to TCR-mediated activation. Given prior suggestions that ZAP70 inhibition might be therapeutically efficacious in autoimmunity^18,19^, and the parallel to TYK2, where an allelic series of immunodeficiency and autoimmune protective alleles led to a recently approved therapeutic, we sought to further elucidate the function of ZAP70:Thr155Met.
ZAP70 exists in an autoinhibited conformation at baseline and, on productive TCR engagement, the two tandem SH2 domains of ZAP70 bind to phosphotyrosine motifs in CD3ζ associated with the TCR. This recruitment coincides with phosphorylation of ZAP70 in the interdomain B region and kinase domain by LCK, which elicits full activation of ZAP70, promoting phosphorylation of key substrates such as SLP76 and LAT that propagate the TCR signaling cascade^20^.
Several rare variants in ZAP70 cause Mendelian immunopathologies by ablating protein expression, impairing kinase activity or relieving autoinhibition^21^. Clinical manifestations are heterogeneous, most commonly combined immunodeficiencies with recurrent infections, although many patients exhibit paradoxical autoimmunity and lymphoproliferative syndromes^21^. We sought to determine how the Thr155Met variant impacts ZAP70 function and whether it is associated with a gain of function through loss of autoinhibition or impaired function. Toward this end, we reconstituted ZAP70-deficient Jurkat T cells with variants of interest and monitored TCR signaling. Wild-type, ZAP70-reconstituted cells upregulated expression of the activation marker CD69 and induced phosphorylation of SLP76 and ZAP70 after TCR stimulation, whereas parental ZAP70-deficient T cells did not (Fig. 1). Cells expressing a ZAP70 double tyrosine mutant (Tyr315Ala&Tyr319Ala) within interdomain B, which is incapable of activation, showed a complete block of activation and SLP76 phosphorylation after TCR stimulation (Fig. 1). Cells reconstituted with ZAP70:Thr155Met exhibited a partial block in activation and phosphorylation of SLP76 (Fig. 1). Thus, ZAP70:T155M associated with autoimmunity impairs TCR signaling strength through an incomplete loss of function. Consistent with this observation, Thr155Met also increases risk of immunodeficiencies in FinnGen (P < 0.0001).Fig. 1. Impaired T cell activation in the ZAP70 Thr155Met variant T cell line in response to TCR stimulation.a,b, ZAP70-deficient Jurkat cells (P116 clone) reconstituted with either ZAP70 or variants and stimulated with anti-CD3 and anti-CD28 (1 μg ml^−1^). Cells were then stained for detection of CD69 (a) or ZAP70, phospho-ZAP70 (pZAP70) and phospho-SLP76 (pSLP76) (b) before FACS analysis. Control, ZAP70-deficient cells infected with control virus; Thr155Met, ZAP70-deficient cells expressing ZAP70 Thr155Met; Try315Ala&Tyr319Ala, ZAP70-deficient cells expressing an inactive mutant ZAP70 Try315Ala&Tyr319Ala; WT ZAP70, ZAP70-deficient cells reconstituted with wild-type ZAP70.
Intersection of hypothyroidism with checkpoint inhibition
Programmed cell death protein 1 (PD-1) checkpoint inhibition to activate systemic immune responses has rapidly emerged as a critical tool in the cancer therapy arsenal, with considerable effort dedicated to understanding the underlying beneficial mechanisms, including enhancement of T cell priming and activation and reinvigoration of exhausted intratumoral T cells^22,23^. However, immune-related adverse events (irAEs), particularly new-onset autoimmune diseases such as hypothyroidism, type 1 diabetes, colitis, hepatitis, myocarditis and vitiligo, remain an important clinical challenge in the use of these important drugs. Recent observations^24–27^, however, have demonstrated that individuals with irAEs may be receiving greater benefit from checkpoint inhibition, an observation confirmed in a recent meta-analysis^28^. In one study, anti-PD-L1 atezolizumab-induced thyroid dysfunction was associated with longer survival across seven trials in six cancer types. Furthermore, in one trial, patients with higher hypothyroid polygenic risk scores (PRSs) had both higher rates of atezolizumab-induced thyroid dysfunction and a lower risk of death in triple-negative breast cancer^27^.
The relationship between checkpoint inhibition and hypothyroidism in clinical practice, alongside observing individual associations to CTLA4 and LAG3, encouraged us to examine this further. Starting with the other two targets with approved drugs, PD-1 (encoded by PDCD1) and its ligand PD-L1 (encoded by CD274), we found that the locus at CD274 contains a nearby upstream genome-wide significant variant (rs911760) (and a second independent signal at neighboring PDCD1LG1). Binding of cytotoxic T lymphocyte-associated protein 4 to CD80 and CD86 prevents continued T cell activation and, among our strongest genome-wide significant associations, we observed variants in LD with the well-described CT60 variant at CTLA4 (rs3087243), which is correlated with increased cytotoxic T lymphocyte-associated protein 4 levels on CD4^+^ T cells^29^—consistent with tamping down immunity with consequent lowering of risk to AIHT (β = −0.155, P = 2.7 × 10^−127^). We further observed genome-wide significant associations at both CD80 and CD86 loci with associations spanning CD80–TIMMDC1 (P = 4.3 × 10^−14^) and at the ILDR1–CD86 locus (P = 4.4 × 10^−10^). Collectively, these genetic findings suggest a strong relationship between AIHT and the immune checkpoint pathway. The consistency of allelic effects of these AIHT associations with the induced effects of checkpoint immunotherapy supports the idea that irAEs commonly seen in checkpoint immunotherapy represent an on-target effect, as suggested by trial studies^27^.
Further to this intersection, we integrated published proteomics data^30^, fine-mapped the GWAS signals of 1,500 protein levels and found that 7 AIHT loci were significantly associated with PD-1 levels. Supporting a direct relationship, in all seven, the allele increasing hypothyroid risk increased soluble PD-1 levels, with most in the subgroup that were associated with broader autoimmune disease risk and cancer protection described below (Supplementary Table 2).
Genetic dissection of hypothyroid risk
Autoimmunity component
As AIHT sits at the nexus between thyroid disease and autoimmunity and occurs at a high frequency, we hypothesized that insights into the underpinnings of both systemic autoimmunity and specific thyroid disease processes would be present. To explore this, we performed a similar meta-analysis of individuals with a nonthyroid-based autoimmune disease (Methods) (excluding anyone with any form of thyroid disease). This scan, hereafter termed ‘autoimmune nonthyroid (AInonT)’ in FinnGen + UKBB, had 70,570 cases and 741,401 controls. Unsurprisingly, there was considerable overlap between AInonT and AIHT scans, with 62 of the 417 index variants from AIHT showing P < 1.2 × 10^−4^ (0.05/417) and 96 with P < 0.0024 (1/417), including 92 of 96 with the same direction of effect (Supplementary Table 3).
Thyroid-specific component
Distinct from autoimmunity, congenital hypothyroidism most often results from gene defects in thyroid development (agenesis or dysgenesis) or in thyroid hormone production (dyshormonogenesis) and has been explored in animal and cellular models^31^. Crossreferencing our GWAS signals within 100 kb of genes on the Genomics England clinical panel for congenital hypothyroidism (https://panelapp.genomicsengland.co.uk/panels/31/download/34/) indicates that independent common variation at six of these gene loci are associated with AIHT (Supplementary Table 4), providing pointers to the thyroid-specific effects in our scan.
Expanding to population-wide hormone production variability, thyroid-stimulating hormone (TSH) levels are broadly used in clinical settings to diagnose hypothyroidism. Recent publications^32,33^ provided a scan of population-wide TSH levels across multiple biobanks, demonstrating a strong polygenic architecture with hundreds of genome-wide significant associations to serum TSH levels. Using clinical laboratory values available on FinnGen participants since 2014, we performed a serum TSH scan on 226,947 AIHT controls—ensuring independence from the AIHT case–control scan and enabling identification of AIHT associations arising from direct impact on thyroid development or function. Our 417 AIHT index variants similarly show a highly significant excess of overlapping associations with 124 (at P < 1.2 × 10^−4^) and 152 (at P < 0.0024) associated with TSH levels (Supplementary Table 3). Consistent with expectation, 149 of 152 overlaps show that increasing AIHT risk corresponds to higher TSH levels.
To explore shared effects between AIHT and AInonT and TSH in more detail, we applied linemodels, a Bayesian classification algorithm^34^, to compare the effect sizes of the 417 AIHT index variants with those in the AInonT and TSH scans separately. We specifically asked whether a model in which there are two groups of variants (roughly ‘shared’ and ‘AIHT specific’) fits the observed effect sizes better than a single relationship and, in the two-group case, assigned group membership probabilities to each variant. Comparing AIHT to AInonT, a 2-group solution (termed AIHT only and shared AI) was a vastly better fit with many variants assigned strongly to one or the other group (57 having ≥99% confidence of being shared (termed AIHT–AInonT-99 below) and 71 having ≥99% confidence of being associated with AIHT only) (Fig. 2 and Supplementary Table 3). Running the same comparison between AIHT and TSH summary statistics produced an even more tail-heavy posterior assignment probability distribution between two groups with 51 having >99% confidence in the shared AIHT–TSH group (termed AIHT–TSH-99), whereas there were 231 with 99% confidence in the ‘AIHT-only’ group.Fig. 2. Analysis of shared effects using linemodels.a–d, Scatter plots of effect sizes from AIHT-associated variant effect sizes compared with effect sizes from nonthyroid autoimmune diseases from FinnGen + UKBB (a), TSH levels from FinnGen (b), skin cancer from FinnGen + UKBB (c) and Graves’ disease from FinnGen + UKBB (d). Optimal two-group fit of Bayesian linemodels run with scale and correlation parameters were fixed and displayed with red indicating an association to AIHT only and blue to both AIHT and the query phenotype. As recommended, β and s.e. were transformed into √(heritability) scale by multiplying both by √(2× MAF × (1 − MAF)). Data points are colored when linemodels group assignments are >95% probability. (Full data and assignment probabilities are listed in Supplementary Table 3.) cors, correlations.
Notably, these sharing groupings were significantly nonoverlapping: the 57 AIHT–AInonT-99 variants were completely distinct from the 51 AIHT–TSH-99 variants and Spearman’s correlation across all 417 AIHT loci between the AIHT–AInonT and AIHT–TSH sharing probabilities was highly significant (ρ = −0.36, P = 2.4 × 10^−14^; Supplementary Table 3). As weaker associations are less able to be assigned confidently to shared or nonshared classes, we estimated sharing proportions from the top half of associations (204 AIHT index variants with P < 1 × 10^−11^) and observed that 38% of associations are likely shared with AInonT (P > 0.8) and 20% shared with TSH (Fig. 2).
Confirming the functional distinction between these sets, we intersected our lead variants with fine-mapping of eQTLs from the expression quantitative trait loci (eQTL) catalog (Supplementary Table 12). Among the 51 AIHT–TSH-99 loci, a strong thyroid excess was seen with 22 mapped to thyroid eQTLs, whereas 4 mapped to T cell eQTLs. By contrast, the 57 AIHT–AInonT-99 loci showed the opposite skew, with 26 mapping to T cell eQTLs and only 3 to thyroid eQTLs. We also utilized the above-mentioned UKBB proteomics data, finding 27 of our index variants significantly associated (P < 5 × 10^−8^) in trans to TSHB (one of two TSH subunits) levels—all with a higher TSHB corresponding to AIHT risk. Of the 27, 26 are among the AIHT–TSH-99 group, with none in the AIHT–AInonT-99 group. Collectively, this confirms that the genetic architecture of AIHT consists of distinct, independent components—one representing processes shared across autoimmune diseases and the other representing thyroid-specific functional contributors.
Autoimmune thyroid disease generally refers to both Hashimoto’s disease (hypothyroidism) and Graves’ disease (hyperthyroidism). We utilized the FinnGen + UKBB meta-analysis of Graves’ disease (Methods), which had a total sample of 6,550 cases and 823,242 controls. Despite the much smaller sample, there was, as expected, a highly significant overlap with AIHT. Among the 417 AIHT index variants were 56 (at P < 1.2 × 10^−4^) and 101 (at P < 0.0024) at the thresholds where 0.05 and 1 are expected by chance. These associations were, however, not unidirectional, with 86 being shared and 15 in the opposite direction. This distinction falls along and reinforces the same dimension described in the earlier linemodels analysis. Specifically, 14 of 15 loci where Hashimoto’s disease and Graves’ disease have opposite-direction effects are in the AIHT–TSH-99 group (with 0 in AIHT–AInonT-99), whereas, among the 86 shared direction variants, 23 were in the AIHT–AInonT-99 group (with 0 in AIHT–TSH-99). Thus, the two common forms of autoimmune thyroid disease appear to tightly share their autoimmune component, whereas their thyroid-specific component is also shared but acts in opposite directions of risk and protection in the two diseases.
Inverse genetic risk shared with skin cancer
Although the relationships to autoimmunity and thyroid function are unsurprising, we used the comprehensive phenotypes of FinnGen to explore overlap with other common diseases via both correlation of disease incidence with AIHT genetic risk and by examination of coincident genome-wide significant loci. AIHT PRS was positively correlated (at a conservative threshold of P < 1 × 10^−5^) with hundreds of FinnGen endpoints. Given the widespread pleiotropy described above, it was unsurprising that there were numerous positive correlations with autoimmune and thyroid disease phenotypes. We also observed significant negative relationships for seven FinnGen cancer endpoints (Supplementary Table 5), led by the incidence of basal cell carcinoma (r = −0.07, P = 1.7 × 10^−17^) and all skin cancers (r = −0.06, P = 8.3 × 10^−13^), with less significant negative overlaps also seen for prostate cancer and the ‘all cancer’ phenotype.
To explore this further, we performed the FinnGen + UKBB meta-analysis of skin cancer (including melanoma and nonmelanoma: 68,822 cases and 657,740 controls) and examined results at the 417 AIHT index variants (Supplementary Table 3). In this set, there were 13 loci exceeding genome-wide significance, 26 at a comparison-wide level of significance (P < 1.2 × 10^−4^) and 48 at a level expected once by chance in 417 loci (P < 0.0024). Almost as striking as the excess itself, 24 of the 26 (and 42 of the 48) showed effects on risk in skin cancer and hypothyroidism that were in opposite directions.
We used linemodels and identified AIHT-only and AIHT–skin cancer-shared groups, the latter with a negative slope, indicating variants at which hypothyroid risk alleles correspond to skin cancer protective alleles, and assigned posterior probabilities of assignment to each group for all 417 loci. We then performed Spearman’s correlation between the probability membership in the AIHT–skin cancer-shared group with the previously defined probabilities of AIHT–AInonT and AIHT–TSH shared groupings. The AIHT–skin cancer membership was positively correlated with AIHT–AInonT (ρ = 0.21, P = 1.6 × 10^−5^) and negatively correlated with AIHT–TSH sharing (ρ = −0.24, P = 8.1 × 10^−7^), indicating that the shared component of skin cancer and AIHT represents an immune program rather than one specific to the thyroid. Among the 22 variants with >95% posterior assignment to the AInonT–skin cancer shared group are well-known missense variants in PTPN22, TYK2, IFIH1, FUT2 and CCDC88B, as well as the low-frequency ZAP70 and IRF3 variants noted above, an intronic FLT3 variant that prematurely truncates FLT3 and other established immune-mediated disease loci (CTLA4, BACH2, STAT2, PTPN2 and IL2RA).
To confirm these relationships in independent samples, we examined association of a hypothyroidism PRS calculated from UKBB alone against all phenotypes in FinnGen (Methods). As expected, the PRS was strongly positively associated with AIHT (odds ratio (OR) = 1.71, P < 1 × 10^−300^) and numerous other autoimmune diseases, including type 1 diabetes, seropositive rheumatoid arthritis and vitamin B_12_ deficiency anemia (all P < 1 × 10^−100^) (Supplementary Table 6). By contrast, a strong negative association was found between the UKBB PRS and multiple cancer endpoints, led by basal cell carcinoma (OR = 0.91 (95% confidence interval (CI) 0.90–0.92), P = 3.0 × 10^−39^), all skin cancers (OR = 0.92 (0.91,0.93), P = 8.4 × 10^−36^) and the umbrella ‘all cancer’ endpoint (OR = 0.96 (0.95, 0.970), P = 6.0 × 10^−26^) as well as individually significant breast (OR = 0.96 (0.94, 0.97)) and prostate (OR = 0.94 (0.93, 0.96)) cancer endpoints (Supplementary Table 7). Results were robust to two reanalyses (1) removing the MHC from PRS calculation and (2) removing all FinnGen AIHT cases (to confirm independence from correlations with the diagnosed phenotype self-evidently related to the PRS). Naturally, this second analysis removes significant relationships to thyroid-related phenotypes but leaves the cancer and other autoimmune disease relationships intact (Supplementary Tables 8 and 9).
Discussion
Using the broad diagnostic and medication information of FinnGen and the UKBB, we present here the largest GWAS to date in autoimmune hypothyroidism. Extensive clinical data available in these biobanks, including prescription medication use, provided a more complete ascertainment of cases, while the diagnostic coverage in each permitted the exclusion of other common thyroid conditions. The resultant analysis included a total of 81,718 cases and yielded a total of 417 independent genome-wide significant variants in addition to the MHC, roughly doubling the numbers found in the largest previous studies^5,6^.
Of these 417 associations, 67 (16%) contained coding variants that were, or were in high LD with, the lead variant. As coding variants were twice as often found in low-frequency association credible sets, these provided the most interpretable pointers to new biological insights into AIHT. Among the coding variants likely driving associations were low-frequency coding variants in both IRF3 and IRF4, which, similar to the hypomorphic low-frequency variants in IFIH1 and TYK2 also seen here, broaden the set of disease-protective perturbations likely acting through the interferon response. This connection is further supported by a common missense variant at TLR3, encoding another interferon-inducing component of antiviral immunity and by a common missense variant in the interferon-inducible IFITM2. Another notable association at PER3 connects circadian regulation to AIHT via two tightly linked missense variants previously linked to morning chronotype and now demonstrated to be protective against AIHT.
We characterized ZAP70:Thr155Met, which demonstrates partial loss of function and autoimmunity, whereas homozygosity for complete loss-of-function alleles produces severe combined immunodeficiencies^21^. In addition, complex ZAP70 genotypes have been associated with autoimmunity. For example, compound heterozygosity for loss-of-function Arg192Trp and gain-of-function Arg360Pro variants caused autoimmunity, but required both alleles to precipitate disease^35^. Collectively, human genetics evidence suggests that ZAP70 function must be optimized within a narrow range; partially impaired activity or enhanced activity elicits autoimmunity^20^. Mouse models have been indispensable in demonstrating this concept and establishing mechanism. Hypomorphic ZAP70 alleles from chemical mutagenesis impaired TCR signaling strength, altering thymocyte development and selecting an autoreactive TCR repertoire with double-stranded DNA antibodies and hyper-immunoglobulin E syndrome^36^. Similarly, a spontaneously arising point mutation of Zap70 in SKG mice caused partial loss of function and impaired negative selection of autoreactive T cells, resulting in arthritis^37^. Adoptive transfer of naive SKG T cells into immunocompromised recipients was sufficient to induce arthritis, suggesting impaired central and peripheral tolerance^38^. The ZAP70:Thr155Met variant associated with AIHT appears to similarly impair TCR signaling, resulting in loss of tolerance, likely through combined effects of impaired negative selection of autoreactive T cells, lymphopenia-induced homeostatic expansion of pathogenic T cells, defective development or function of regulatory T (T_reg_) cells and resistance to peripheral tolerance mechanisms such as T_reg_ cell suppression or anergy induction^39^. These findings have critical therapeutic implications: as ZAP70:Thr155Met is a partial loss of function, targeting ZAP70 kinase activity with inhibitors to treat autoimmunity could come with unanticipated consequences. Complete inhibition of ZAP70 may ameliorate T cell-driven autoimmunity at the expense of immunodeficiency, whereas partial inhibition of ZAP70 may exacerbate self-tolerance dysregulation.
The power of this GWAS enables not only detection of significant polygenic overlaps with other phenotypes such as nonthyroid autoimmune diseases and TSH levels (neither of which is individually surprising) but also the determination that these particular overlaps make up distinct, nonoverlapping components of AIHT risk. Using Bayesian linemodels, we estimated that 38% of AIHT associations are shared with autoimmune diseases more broadly and 20% are shared with variation that elevates TSH levels unrelated to immunity. The parallel analysis of Graves’ disease and AIHT indicates that, although there is widespread same-direction sharing of the autoimmune components, the thyroid or TSH alleles act in opposite directions, consistent with the hyperthyroid versus hypothyroid character of each disease. The linemodels analysis of this pairing underscores two main components: shared alleles with similar, same-direction effects and shared association with opposite-direction effects, underscoring the importance of distinguishing autoimmune thyroid phenotypes in genetic articulation.
PRS analysis (with external PRS tested in FinnGen) demonstrated a correlation of AIHT PRS to lower risk of skin, as well as breast, prostate and ‘all’ cancer phenotypes. The ‘all cancer’ result suggests a broad protective signal shared by most or all cancers, with the sample size of breast and prostate simply being sufficiently large to be individually detected. However, the protective effect on skin cancer was significantly greater, with nonoverlapping CIs compared to all cancers. Considerable sharing between the AIHT and skin cancer GWASs indicated that the opposite-effect alleles conferring risk to AIHT and protection from skin cancer were concentrated in the autoimmune component of AIHT and unrelated to thyroid function specifically. Although skin cancer, and particularly basal cell carcinoma, showed uniquely strong opposite-direction effects, the same highly significant observation in prostate and breast cancer suggests that this is most likely a consequence of general immune surveillance and response to emergent solid tumors, which, although likely of differential relevance to different tumor types, are not specific to skin.
Furthermore, AIHT-associated genetic variants implicate most genes encoding successful targets of checkpoint immunotherapy, including a new rare coding variant in LAG3. In addition, seven AIHT risk alleles are significantly associated with soluble PD-1 levels in recently published proteomic data from the UKBB. Moreover, the genetic intersection between AIHT risk and protection from skin cancer sheds light on published observations that individuals with irAEs receive greater benefits from checkpoint immunotherapy. We demonstrated here the same hypothyroid genetic risk that predisposes to thyroid irAEs and improved immunotherapy outcomes also represent a general population-wide signature of cancer protection. This suggests that genetically mediated variation in immune surveillance or function, partially encoded in checkpoint genes, is an important contributor to interindividual variation in cancer risk and potentially highlights mechanisms that could be effective in prevention as well as treatment.
Methods
FinnGen ethics statement and study cohort
The FinnGen study (https://www.finngen.fi/en) is a public–private partnership founded in 2017, including Finnish universities, biobanks and hospital districts, as well as several pharmaceutical companies. The aim was to collect both National Health Records and genetic data from 500,000 Finns. The study participants included patients with acute and chronic diseases, healthy volunteers and population collections. R12 consisted of 520,210 individuals (55% women and 45% men, average age 61.9 years). Study participants in FinnGen provided informed consent for biobank research, based on the Finnish Biobank Act. Alternatively, separate research cohorts, collected before the Finnish Biobank Act came into effect (in September 2013) and the start of FinnGen (August 2017), were collected based on study-specific consents and later transferred to the Finnish biobanks after approval by Fimea (Finnish Medicines Agency), the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Fimea. The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) statement no. for the FinnGen study, under which this research is conducted, is HUS/990/2017.
The FinnGen study is approved by the Finnish Institute for Health and Welfare (permit numbers THL/2031/6.02.00/2017, THL/1101/5.05.00/2017, THL/341/6.02.00/2018, THL/2222/6.02.00/2018, THL/283/6.02.00/2019, THL/1721/5.05.00/2019 and THL/1524/5.05.00/2020); digital and population data service agency (permit numbers VRK43431/2017-3, VRK/6909/2018-3 and VRK/4415/2019-3); the Social Insurance Institution (permit numbers KELA 58/522/2017, KELA 131/522/2018, KELA 70/522/2019, KELA 98/522/2019, KELA 134/522/2019, KELA 138/522/2019, KELA 2/522/2020 and KELA 16/522/2020); Findata (permit numbers THL/2364/14.02/2020, THL/4055/14.06.00/2020, THL/3433/14.06.00/2020, THL/4432/14.06/2020, THL/5189/14.06/2020, THL/5894/14.06.00/2020, THL/6619/14.06.00/2020, THL/209/14.06.00/2021, THL/688/14.06.00/2021, THL/1284/14.06.00/2021, THL/1965/14.06.00/2021, THL/5546/14.02.00/2020, THL/2658/14.06.00/2021 and THL/4235/14.06.00/2021); Statistics Finland (permit numbers TK-53-1041-17 and TK/143/07.03.00/2020 (earlier TK-53-90-20), TK/1735/07.03.00/2021 and TK/3112/07.03.00/2021); and Finnish Registry for Kidney Diseases permission or extract from the meeting minutes on 4 July 2019.
The Biobank Access Decisions for FinnGen samples and data utilized in FinnGen Data Freeze 12 include: THL Biobank BB2017_55, BB2017_111, BB2018_19, BB_2018_34, BB_2018_67, BB2018_71, BB2019_7, BB2019_8, BB2019_26, BB2020_1 and BB2021_65; Finnish Red Cross Blood Service Biobank 12 July 2017, Helsinki Biobank HUS/359/2017, HUS/248/2020, HUS/430/2021 §28, §29, HUS/150/2022 §12, §13, §14, §15, §16, §17, §18, §23, §58, §59, and HUS/128/2023 §18; Auria Biobank AB17–5154 and amendment no. 1 (17 August 2020) and amendments BB_2021-0140, BB_2021-0156 (26 August 2021, 2 February 2022), BB_2021-0169, BB_2021-0179, BB_2021-0161, AB20-5926 and amendment no. 1 (23 April 2020) and its modifications (22 September 2021) BB_2022-0262, BB_2022-0256; Biobank Borealis of Northern Finland_2017_1013, 2021_5010, 2021_5010 Amendment, 2021_5018, 2021_5018 Amendment, 2021_5015, 2021_5015 Amendment, 2021_5015 Amendment_2, 2021_5023, 2021_5023 Amendment, 2021_5023 Amendment_2, 2021_5017, 2021_5017 Amendment, 2022_6001, 2022_6001 Amendment, 2022_6006 Amendment, 2022_6006 Amendment, 2022_6006 Amendment_2, BB22-0067, 2022_0262, 2022_0262 Amendment; Biobank of Eastern Finland 1186/2018 and amendment 22§/2020, 53§/2021, 13§/2022, 14§/2022, 15§/2022, 27§/2022, 28§/2022, 29§/2022, 33§/2022, 35§/2022, 36§/2022, 37§/2022, 39§/2022, 7§/2023, 32§/2023, 33§/2023, 34§/2023, 35§/2023, 36§/2023, 37§/2023, 38§/2023, 39§/2023, 40§/2023 and 41§/2023; Finnish Clinical Biobank Tampere MH0004 and amendments (21 February 2020 and 6 October 2020); BB2021-0140 8§/2021, 9§/2021, §9/2022, §10/2022, §12/2022, 13§/2022, §20/2022, §21/2022, §22/2022, §23/2022, 28§/2022, 29§/2022, 30§/2022, 31§/2022, 32§/2022, 38§/2022, 40§/2022, 42§/2022 and 1§/2023; and Central Finland Biobank 1-2017, BB_2021-0161, BB_2021-0169, BB_2021-0179, BB_2021-0170, BB_2022-0256, BB_2022-0262 and BB22-0067. Decision was made allowing continuation of data processing until 31 August 2024 for the following projects: BB_2021-0179, BB22-0067, BB_2022-0262, BB_2021-0170, BB_2021-0164, BB_2021-0161 and BB_2021-0169; and Terveystalo Biobank STB 2018001 and amendment 25 August 2020, Finnish Hematological Registry and Clinical Biobank decision 18 June 2021, Arctic biobank P0844: ARC_2021_1001.
Phenotypic definitions in FinnGen and UKBB
Exact definitions of International Classification of Diseases (ICD; 8th^40,41^, 9th^42,43^ and 10th^44,45^ edns) diagnoses, medications and procedures for all FinnGen phenotypes are publicly available at https://risteys.finregistry.fi/ using the tag names listed in the summary below. To create a large but specific set of individuals with AIHT, we first collected all individuals with 1+ years of levothyroxine purchases (H03AA01), ICD-10 codes E03[89]x, ICD-9 244[89]X and ICD-8 244[99,00]. From the cases, we then excluded anyone with thyrotoxicosis E05[01289], thyroidectomy (NOMESCO code BAA60), postsurgical hypothyroidism (E89.0[19]), pituitary tumor (D35.2), panhypopituitarism (E23.00), hypopituitarism (E23.08), hypogonadotropic hypogonadism (E32.04), lack of adrenocorticotropic hormone (E23.03) or deficiency of growth hormone (E23.01). Controls were everyone else in FinnGen, excluding those with any of the autoimmune codes in Supplementary Table 11 or listed at https://risteys.finngen.fi/endpoints/AUTOIMMUNE. Detailed descriptions of the FinnGen phenotype can be found at https://risteys.finregistry.fi/endpoints/E4_HYTHY_AI_STRICT.
The UKBB phenotype was created by executing the FinnGen endpoint definition code using the identical ICD codes for inclusion and exclusion. Details specific to UKBB include levothyroxine obtained from ‘treatment/medication code’ (20003) and ‘GP prescription records’ (42039) and self-reported hypothyroidism or myxoedema (1226). In addition, to match FinnGen exclusion criteria in the UKBB, thyroidectomy was defined as operation code 1432 or operative procedures B081–B084 (main or secondary OPCS4).
To discriminate autoimmune versus thyroid loci, we created a set of individuals with autoimmune disease but who did not have hyperthyroidism or AIHT. The complete list of selected autoimmune diseases is listed at https://risteys.finregistry.fi/endpoints/AUTOIMMUNE. For the phenotype AUTOIMMUNE_NONTHYROID in the UKBB, from the selected individuals with autoimmune disease, we then excluded those with thyroidectomy, use of levothyroxine or carbimazole, any individuals in the strict AIHT cases described above and those with AIHT (ICD-10: E05[0 | 9] ICD-9: 2420).
The skin cancer endpoint used incorporated all melanoma and nonmelanoma skin cancers and can be viewed at https://risteys.finngen.fi/endpoints/C3_SKIN_EXALLC. These captured all instances of skin cancers recorded in hospital discharge or death registries (ICD-10: C43, C44, ICD-8 or ICD-9: 172–173) and cancer registry (ICD-O-3: C44). The combined FinnGen–UKBB scan has 68,822 cases and 657,740 controls and is available at https://metaresults-ukbb.finngen.fi/pheno/C3_SKIN_EXALLC.
Graves’ disease was defined using thyrotoxicosis with diffuse goiter (ICD-10: E05.0, ICD-9: 2420) https://risteys.finngen.fi/endpoints/E4_THYTOXGOITDIF. The combined FinnGen–UKBB scan has 6,550 cases and 823,242 controls and is available at https://metaresults-ukbb.finngen.fi/pheno/E4_THYTOXGOITDIF.
Meta-analysis and definition of LD-independent associations
Array-based genotype data in FinnGen were called and subjected to variant and sample-level quality control, followed by phasing and imputation (using a panel of 8,554 deeply sequenced Finnish whole genomes) using Eagle 2.3.5 and Beagle 4.1 (described further at https://finngen.gitbook.io/documentation/methods/genotype-imputation/genotype-imputation). In FinnGen DF12, this project-wide process resulted in a total of 500,348 individuals after removal of related individuals and non-Finnish ancestry people and were used in all FinnGen analyses. FinnGen data analysis pipelines are freely available at https://github.com/FINNGEN/; the FinnGen Handbook, https://finngen.gitbook.io/documentation/, contains a detailed description of data production and analysis, including code used to run analyses. GWAS analysis was performed using REGENIE 2.2.4 and a logistic mixed model adjusted for age, sex, genotyping batch and the first ten principal components (PCs) of ancestry with an approximate Firth test for robust effect size estimation. UKBB analysis was performed using the pipeline implemented by the Pan UKBB project^46^ (https://pan.ukbb.broadinstitute.org/) using SAIGE with age, sex, age × sex, age^2^, age^2^ × sex and 10 ancestry PCs. Meta-analysis of the FinnGen R12 and UKBB European ancestry subset (n = 420,531 after quality control and population clustering described on the Pan UKBB website) was performed using inverse-variance weighted meta-analysis.
As high-resolution fine-mapping algorithms have not been shown to be fully reliable in the context of meta-analyses, particularly when performed using different genotyping and imputation techniques^47^, we opted conservatively to flag only index variants representing the most significantly associated variant in confidently LD-independent loci (Supplementary Table 1). Many papers use a fixed threshold such as r^2^ < 0.05 to define LD independence, but, with numerous associations with P values far below 1 × 10^−100^; such a threshold is inadequate and false but apparently genome-wide significant peaks will occur. We defined a stricter definition of LD-independent associations as follows:
- Starting with the most significant association with P < 5 × 10^−8^ on each chromosome, around each genome-wide significant variant, a ±2-Mb window was screened.
- A dynamic LD threshold for each association is defined as T = min (0.1, r5), where r5 is defined as the r^2^ value at which the expected residual χ^2^ would be 5.0. This threshold is trivially computed because the expected χ^2^ of an LD neighbor is r^2^ × χ^2^ of the causal variant, so conservatively sets a dynamic threshold that leaves expected signal vastly below genome-wide significance.
- Secondary associations were therefore counted as independent only if they were genome-wide significant and r^2^ to any more significant association was <T. As a result of potential inaccuracy of low values of r^2^, at particularly strong associations where T < 0.02 (that is, residual association signal may exist even at very low values of r^2^), secondary associations were not defined within 1 Mb of such signals. Nearby signals were confirmed as independent using full conditional analysis in FinnGen on top signals using REGENIE and, owing to the limited accuracy of pairwise LD inference beyond two signals, we only reported two signals within 1 Mb here with the exception of a handful of examples where three signals within 1 Mb were all conditionally independently associated (the third significant after conditioning on the first two simultaneously) at genome-wide significance in FinnGen. Full conditional regional analyses can be browsed at r12.finngen.fi.
ZAP70 functional studies
ZAP70-deficient Jurkat cells were obtained from American Type Culture Collection. Cells were reconstituted with ZAP70 (wild-type (WT), Thr155Met, Tyr315Ala&Tyr319Ala) and stimulated with anti-CD3 and anti-CD28 (1 μg ml^−1^). Cells were fixed and permeabilized with Cytofix or Cytoperm buffer (BD Biosciences) and stained with the indicated antibodies. Flow cytometry was performed on a CytoFLEX LX (Beckman Coulter) flow cytometer and analysis was performed using FlowJo software.
Cell frequencies in each indicated gate are reported as percentages of total live singlet cells. No sorting was performed. Live cells were selected based on forward scatter (FSC) and/or side scatter (SSC) profiles. Singlets were selected based on FSC-H/FSC-A profiles. Quadrant plots were generated for the indicated antibody markers.
Bayesian classification of association results
Linemodels (https://github.com/mjpirinen/linemodels) was used to explore the existence of, and classify individual variants into, clusters based on bivariate effects. Using as input the (β, s.e.) pairing from two GWAS analyses for a set of variants, linemodels probabilistically clusters variants into groups, providing both a likelihood of each number of groups and posterior probability of assignment to each group. Linemodels consists of three parameters: scale (the magnitude of effect), slope (the multiplicative relationship between the effects on each phenotype) and correlation (the expected consistency with the expected values). As recommended, β and s.e. were transformed into √(heritability) scale by multiplying both by √(2× MAF × (1 − MAF)) before fitting linemodels. The initial value of the scale parameter is set such that 95% of the effect sizes are within twice the scale parameter for all groups. We also chose a correlation parameter of 0.99 to permit modest deviation from the exact best-fit slope. Models with one or two slopes were fit using the EM-algorithm implemented in linemodels and the likelihood ratio test used to compare models. In two-line models, the scale parameters were fixed to be equal. For the comparisons involving AInonT and SKIN, one of the slopes was set to 0 to capture only those variants that belong to AIHT, whereas the other slope was optimized with an EM-algorithm. The slopes for shared groups were 0.43 for AInonT and −0.37 for SKIN. When running linemodels for TSH, both slopes were allowed to be optimized (because AIHT is so common, purely autoimmune associations will induce a residual effect on TSH population wide) and were found to be 0.13 and 0.99. After optimizing slopes, we used an iterative Gibbs Sampler to assign group probabilities.
PRS PheWAS
We conducted a phenome-wide association study (PheWAS), investigating the associations between a PRS for hypothyroidism, derived from the UKBB data^48^, and 4,739 phenotypes from the FinnGen R11. We excluded 149 endpoints with <50 cases from the analysis. We applied logistic regression with LDLT decomposition using the ‘fastglm’ R package (https://CRAN.R-project.org/package=fastglm). The association between the standardized PRS vector and each endpoint was adjusted for sex, age, age^2^, the first six PCs and genotyping array features.
Our primary focus was the relationship to PRS, with other factors included to address potential residual confounding, but providing no additional meaningful information for our study. Sex-specific traits were run in only the appropriate sex. We set Bonferroni’s threshold at P < 1.05 × 10^−5^ (0.05/4,739) in consideration of the multiple tests examined.
In addition to the described experiment, we conducted two modified PheWAS analyses to further explore our findings. The first, an ‘exclusion-PRS-PheWAS’ ^49^, aimed to determine whether the secondary trait associations with the hypothyroidism score were influenced by overlapping samples between the hypothyroidism endpoint (E4_HYTHY_AI_STRICT) and studied phenotype. For this, we removed hypothyroidism cases from the analysis and conducted a PheWAS on such filtered cohorts. The second study, a ‘noMHC-PRS-PheWAS’, examined whether the associations that we discovered were driven by the presence of the MHC locus. In this study, we excluded variants located in the MHC region (6:28510120–33480577; GRCh38) from the PRS model and then assessed the phenome-wide associations using the modified PRS.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-026-02521-1.
Supplementary information
Reporting Summary Peer Review File Supplementary TableSupplementary Tables 1–12.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1International Classification of Diseases Eighth Revision (WHO, 1965).5643638 · pubmed ↗
- 2Classificatio Morborum et Causarum Mortis: Tauti- ja kuolinsyyluokitus / Klassifikation av sjukdomar och dödsorsaker [Classification of Diseases and Causes of Death] (Lääkintöhallitus (National Board of Health), 1969).
- 3International Classification of Diseases Ninth Revision (WHO, 1977).695541 · pubmed ↗
- 4Tautiluokitus 1987, Osa 1: Systemaattinen osa / Klassifikation av sjukdomar 1987, Del 1: Systematisk del [Classification of Diseases 1987, Part 1: Systematic Part] (Lääkintöhallitus ja Sairaalaliitto (National Board of Health and Hospital League), 1986).
- 5International Statistical Classification of Diseases and Related Health Problems Tenth Revision (WHO, 1992).3376487 · pubmed ↗
- 6Terveyden ja hyvinvoinnin laitos (National Institute for Health and Welfare) Tautiluokitus ICD-10 / Klassifikation av sjukdomar ICD-10 [Classification of Diseases ICD-10] 3rd edn (St Michel Print, 2011).
