Genome-wide DNA methylation patterns for indicators of liver steatosis: a longitudinal multiomic study
Jo Ciantar, Sonja Rajić, Daria Kostiniuk, Ella Raulamo, Noora Kartiosuo, Liye Lai, Pashupati P. Mishra, Leo-Pekka Lyytikäinen, Marcus E. Kleber, Suvi Rovio, Juha Mykkänen, Katja Pahkala, Annette Peters, Juliane Winkelmann, Winfried März, Mika Kähönen, Olli Raitakari

TL;DR
This study identifies DNA methylation patterns in blood linked to liver steatosis, revealing age-related changes and gene associations over time.
Contribution
The study reveals methylation markers for liver steatosis and their age-dependent associations, offering new insights into liver disease pathophysiology.
Findings
Lower methylation at cg06690548 (SLC7A11) is associated with liver steatosis in the Young Finns Study.
Nine CpGs associate with GGT and 23 with FLI across multiple cohorts, with strongest links in older age groups.
Gene expression near ABCG1 and SREBF1 mediates associations with FLI, but meQTLs do not modulate these links.
Abstract
To identify blood DNA methylation profiles related to liver steatosis, we performed an EWAS on the presence of ultrasonically-identified liver steatosis in the Young Finns Study (YFS) participants (n = 1529, 33–50y.), and on liver enzyme levels and fatty liver index (FLI) across three discovery cohorts: YFS, LURIC (n = 2371, 17–92y.) and KORA FF4 (n = 1872, 39–88y.). We further investigated the discovered associations across the longitudinal subset of YFS (n = 255), encompassing three follow-ups over 32 years, and the three-generational YFS-3G follow-up in 2018–2020. Finally, we examined the associations of the discovered CpGs with nearby genetic variation and whole blood expression of nearby genes. In YFS, the methylation levels of cg06690548 (SLC7A11) were lower in individuals with liver steatosis (Δbeta = − 0.011, FDR = 0.004). Methylation of 9 CpGs associated with GGT and 23 CpGs…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4- —Tampere University (including Tampere University Hospital)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEpigenetics and DNA Methylation · Liver Disease Diagnosis and Treatment · Genetic Associations and Epidemiology
Background
Steatotic liver disease is a spectrum of liver conditions with various aetiologies, ranging from mild fat infiltration to steatohepatitis, fibrosis and cirrhosis [1]. The most common subtype, known as non-alcoholic fatty liver disease (NAFLD), has recently been revised as metabolic dysfunction-associated steatotic liver disease (MASLD) to emphasise the central role of metabolic dysfunction in the pathogenesis of the disease [1]. MASLD is defined as over 5% liver fat content accompanied by at least one cardiovascular risk factor [1], and about 99% of patients diagnosed with NAFLD meet this criterion [2]. Affecting 25–40% of the global population, MASLD’s alarmingly increasing prevalence is contributing to the growing liver transplantation burden [3–5]. MASLD is the leading cause of chronic liver diseases and is a risk factor for cardiovascular diseases, overall mortality, and decreased quality of life [6, 7]. It is a reversible condition, making early identification and lifestyle interventions crucial to prevent its progression to chronic and severe liver diseases, such as steatohepatitis, cirrhosis, and hepatocellular carcinoma [3–5].
The increase in the prevalence of MASLD parallels with the rise in sedentary behaviour, obesity, insulin resistance, and metabolic syndrome, but the pathophysiology and contributing factors are not fully understood [8, 9]. Insulin resistance is a key component in MASLD development that increases de novo lipogenesis and enhances fatty acid transport from adipose tissue to the liver [10]. Similarly, obesity increases circulating free fatty acid levels contributing to MASLD development and progression [8]. While alcohol-related liver disease (ALD) and NAFLD have been considered distinct conditions since 1980, both share similar pathophysiological mechanisms [11–13]. To acknowledge this, the revised terminology also introduced “metabolic dysfunction and alcohol-associated liver disease” (metALD), capturing the coexistence of two distinct aetiologies without excluding their shared pathophysiology [1].
DNA methylation is a reversible epigenetic mechanism that regulates gene expression and can have phenotypic effects. It has been shown to mediate the adaptation to many environmental exposures, such as changes in nutritional status and metabolic diseases [14]. Liver biopsies of NAFLD patients exhibit significantly lower global DNA methylation levels in comparison to biopsies from obese controls, with methylation levels correlating with severity of NAFLD [15]. Furthermore, Ahrens et al. have reported changes in liver DNA methylation patterns in genes related to intermediary metabolism and insulin signalling, which also correlated with NAFLD severity [16].
Peripheral blood DNA methylation has been suggested to be a potential biomarker for liver steatosis [17]. Previous EWASes have shown that methylation levels in CpGs of several solute carriers (SLCs) and phosphoglycerate dehydrogenase (PHGDH) are robustly associated with gamma-glutamyltransferase (GGT) levels [18] and identified 22 CpGs, including several CpGs in SLCs and PHGDH, associated with NAFLD [19]. These studies have relied on a single discovery cohort (n < 1500), with the significant findings subsequently replicated in several smaller cohorts. Several studies have explored whether changes in DNA methylation profiles associated with liver steatosis are causal or precede the onset of metabolic diseases. Nano et al. linked solute carrier family 7 member 11 (SLC7A11) to the expression of lipid-associated genes and showed the causality of CpGs associated with GGT using Mendelian randomisation [18]. Ma et al. showed that some CpGs associated with NAFLD were predictive of the onset of type 2 diabetes (T2D) [19]. However, both studies measured DNA methylation at a single timepoint, making it impossible to investigate the dynamic changes in methylation profiles over time. To address this gap, Zeng et al. conducted a longitudinal analysis of peripheral blood DNA methylation and liver fat content in 96 individuals with approximately five-year follow-up. Their results indicated that DNA methylation in SREBF1, ADBCG1, DHCR24, CTPT1A and LINC00649 may predict the change in liver fat content, but the associations between SREBF1 and ADBCG1 were largely influenced by body weight [20].
Here, we set to investigate the association between whole blood DNA methylation, measured with the Illumina EPIC array, and ultrasonically identified liver steatosis (MASLD and metALD) using the 2011 follow-up of the Young Finns Study (YFS, n = 1529). These findings were replicated across three generations in the 2018 YFS-3G follow-up, which included the original YFS cohort (G1, n = 1273), their parents (G0, n = 1342), and their offspring (G2, n = 1309). We also investigated whether the methylation levels of CpGs that were associated with liver steatosis predicted its presence in 2018 prospectively in 2011 and 1986 in a subset of the YFS participants with longitudinal DNA methylation data (n = 255). In addition, we aimed to discover CpGs associated with three separate liver enzymes and fatty liver index (FLI) in at least two of the discovery cohorts–YFS 2011 follow-up (n = 1529), German cohorts Cooperative Health Research in the Region of Augsburg (KORA, n = 1872), and the LUdwigshafen RIsk and Cardiovascular Health study (LURIC, n = 2371)—and validate these associations in the three-generational YFS-3G 2018 follow-up. Finally, we investigated whether methylation quantitative trait loci (meQTLs) influenced the methylation levels of the discovered CpGs and whether the methylation levels were associated with the expression of nearby genes in whole blood.
Materials and methods
Cohorts and different analysis settings utilised are described in Fig. 1. Throughout the text and tables hg19/hg37 genetic coordinates are utilized.Fig. 1A Cohorts and B study settings utilised in the study. Out of the discovery cohorts, the information on the presence of steatotic liver was available only from the YFS 2011 and 2018 follow-ups, while liver enzyme levels and FLI were available also from KORA and LURIC. The longitudinal study setting is comprised of 255 individuals from whom we have DNA methylation data available from the three follow-ups spanning over 30 years, while in the multigenerational setting we utilise DNA methylation data available from the follow-up cohort (G1) and their parents (G0) and children (G2), all of whom participated in the 2018 follow-up of YFS. Abbreviations: G = generation, EWAS = Epigenome wide association study, GGT = gamma glutamyl transferase, FLI = fatty liver index, meQTL = methylation quantitative trait loci, eQTM = expression quantitative trait methylation
YFS
The YFS is a multicentre follow-up study on cardiovascular risk from childhood to adulthood in Finland. The study was launched in 1980, with 3596 children and adolescents (aged 3–18 years) participating in the baseline study [21]. Participants have been followed through several examinations, including comprehensive risk factor assessments with major follow-ups in 1986, 2001, 2007, 2011 and 2018–2020. The 40-year follow-up was conducted in 2018 and expanded to include parents (G0) and children (G2) of the original YFS participants (G1) [22].
DNA methylation was successfully profiled for 311 study participants in the 1986 follow-up, 1714 in the 2011 follow-up, and 1311 in the 2018 follow-up, with 255 original study participants having DNA methylation data available for all three time points. In addition, DNA methylation data was available from the 2018 follow-up for 1343 parents (G0) and 1309 children (G2). See Supplementary Table 1 for the demographics of the utilised populations.
KORA
The KORA FF4 study is the second follow-up of the KORA S4 study, conducted in 2013–2014. Of the 4261 participants of the S4 baseline study, 2279 also participated in the FF4 14-year follow-up study. In both follow-up surveys, participants completed a lifestyle questionnaire and underwent standardised examinations including blood samples collection, as described previously [23, 24]. DNA methylation profiling was successful for 1872 participants of the KORA FF4 study (39–88y.).
LURIC
LURIC is a patient cohort of 3316 German individuals aged 17–92 years, who underwent coronary angiography between 1997 and 2000 and have since been followed up on non-fatal events after five years, and on all-cause and cause-specific mortality after 10 years [25]. DNA methylation profiling was successful for 2371 participants of the LURIC study.
DNA methylation
For the DNA methylation analysis of the YFS cohort, leukocyte DNA was obtained from EDTA blood samples collected during the 1986 and 2011 follow-ups using a Wizard® Genomic DNA Purification Kit (Promega Corporation, Madison, WI, USA). Samples in the 2018 follow-up were processed using the Perkin Elmer Chemagic (CMG-1074) according to the manufacturer’s instructions, except for utilizing modified binding buffer 2 with a subset of the samples.
In YFS, 181 of the 1986 samples and 188 of the 2011 samples were analyzed with Illumina Infinium HumanMethylation450 BeadChips. The 1986 samples were processed at the Helmholtz Zentrum in Munich, Germany, and the 2011 samples in the Core Facility at the Institute of Molecular Medicine Finland (FIMM), University of Helsinki, according to the protocol by Illumina, as described earlier [26]. 130 of the YFS 1986 samples and 1526 of the YFS 2011 samples were analyzed with the Illumina Infinium MethylationEPIC v1 BeadChip at Helmholtz Zentrum Munich, Germany as described earlier [27]. Samples were applied to the arrays in a randomised order. Aliquots of 1 μg of DNA were subjected to bisulphite conversion, and a 4 μl aliquot of bisulphite-converted DNA was subjected to whole-genome amplification, followed by enzymatic fragmentation and hybridization onto an Illumina Infinium MethylationEPIC BeadChip. The arrays were scanned with the iScan reader (Illumina). All pre-processing steps were performed using functions implemented in the minfi R/Bioconductor package [28]. Cross-reactive probes [29, 30] and probes with SNPs given by minfi were removed. For discovery analysis 769,683 probes from 1526 samples with profiling done with EPIC v1 were utilized.
For the YFS 2018–2020 follow-up, blood DNA methylation profiling was performed for 3963 participants, spanning all three generations (G0 n = 1343, G1 = 1311, and G2 n = 1309). In short, methylation profiling was performed with the Illumina Infinium MethylationEPIC BeadChip v1 and v2 at Helmholtz Zentrum, Munich, Germany. Samples were applied to the arrays in a randomised order. Aliquots of 1 μg of DNA were subjected to bisulphite conversion, and a 4 μl aliquot of bisulphite-converted DNA was subjected to whole-genome amplification, followed by enzymatic fragmentation and hybridization onto an Illumina Infinium MethylationEPIC BeadChip (version 1 and 2). The arrays were scanned with the iScan reader (Illumina). Background subtraction was performed on all probe values using the bgcorrect.illumina function in the minfi package. Probes with a detection p-value > 10e–16 and samples with a sample call rate of more than 95% were filtered out. Autosomal probes were separated into six groups by probe-type and quantile normalization was performed separately for each group. Samples for which the actual sex did not match the predicted sex were excluded and quality control was performed on the data. All pre-processing steps were performed using functions implemented in the minfi, sesame or limma R/Bioconductor package. For 255 individuals with available methylation data from all three YFS follow-ups, data from 1986 and 2011 was renormalized with sesame to enable longitudinal analysis.
For KORA genome-wide DNA methylation was performed using the Illumina Infinium MethylationEPIC v1 BeadChip® for 1,928 KORA FF4 samples. Genomic DNA (750 ng) was bisulphite-converted using the EZ-96 DNA Methylation Kit in two separate batches (n = 488, n = 1440). A subsequent methylation analysis was performed on an Illumina iScan platform according to standard protocols provided by Illumina. GenomeStudio software (version 2011.1) with Methylation Module v1.9.0 was used for an initial quality control of assay performance and for the generation of methylation data export files. Further quality control and preprocessing of the data were performed in R v3.5.1 with the package minfi v1.28.3, and following primarily the CPACOR pipeline as described earlier [31]. Cross-reactive probes [29, 30], probes with SNPs with a minor allele frequency of > 5% at the CG position or the single base extension as given by minfi; and with > 5% missing values were removed. 1872 samples and total of 697,732 probes remained for analysis.
For LURIC, the DNA methylation levels were quantified using the Illumina Infinium MethylationEPIC BeadChip according to the manufacturer’s protocols. In the LURIC study, quality control was implemented using the CPACOR pipeline as described earlier [32]. A total of 795,619 autosomal probes from 2423 samples were included in further analyses. CpGs located in close proximity (1–2 bp) to a genetic polymorphism in the European population with a frequency of > 0.01% as well as cross-reactive probes [33] and probes with a detection P-value of > 0.05 in at least 1% of the samples were removed using the rmSNPandCH function in the DMRcate package [34].
For all datasets, methylated and an unmethylated signal count per CpG site are obtained and combined into β-values, defined as the ratio of the methylated signal intensity divided by the overall signal intensity: β-value = M/(M + U + α), where an offset was added as a regularization for the situation when both M and U are low, as recommended by Illumina [35]. Untransformed beta values were utilized in all the analyses [36]. 689,101 probes overlapped with all the three discovery cohorts.
Genome-wide genotyping in the YFS
Genomic DNA was extracted from peripheral blood leukocytes using a commercially available kit and a Qiagen BioRobot M48 Workstation according to the manufacturer’s instructions (Qiagen, Hilden, Germany). Genotyping was performed using a custom-built Illumina Human 670 k BeadChip at the Welcome Trust Sanger Institute, as described earlier [37]. Genotype imputation was performed using TOPMed r2 as reference [38].
Genome-wide blood transcriptomics analysis in the YFS
From the 2011 follow-up of YFS (YFS 2011), whole blood was collected into PaXgene Blood RNA Tubes (PreAnalytix) and isolated with a PaXgene Blood microRNA Kit (Qiagen), including the DNase Set with the QiaCube according to the manufacturer’s instructions. The expression levels were analysed with an Illumina HumanHT-12 version 4 Expression BeadChip (Illumina Inc.) containing 47,231 expression and 770 control probes as described earlier [39]. 200 ng of RNA was reverse-transcribed into cDNA and biotin-UTP-labelled using the Illumina TotalPrep RNA Amplification Kit (Ambion); 1500 ng of cDNA was then hybridized onto the Illumina HumanHT-12 v4 Expression BeadChip. The BeadChips were scanned with the Illumina iScan system. Raw Illumina probe data were exported from GenomeStudio and analysed in R using the Bioconductor packages. The expression data were processed using non-parametric background correction, followed by quantile normalisation with control and expression probes using the neqc function in the limma package, and log2 transformation. The expression analysis was successful in 1364 samples with DNA methylation data also available.
Liver ultrasound in the YFS
In the 2011 and 2018 follow-ups, ultrasound imaging and steatosis evaluation were performed using a validated protocol, employing Sequoia 512 ultrasound mainframes (Acuson) with 4.0 MHz adult abdominal transducers. The diagnostic evaluation of liver steatosis was performed visually using a standard system by a trained sonographer according to the following criteria: liver-to-kidney contrast, parenchymal brightness, deep beam attenuation, and bright vessel walls. Participants were further classified into those with detectable liver steatosis and those with a normal liver.
Liver enzymes
In YFS, serum alanine aminotransferase (ALT), aspartate aminotransferase (AST), and GGT concentrations were measured with System Reagent (Beckman Coulter Biomedical). In KORA, the GGT, AST, and ALT levels were analysed according to the recommendations of the International Federation of Clinical Chemistry (IFCC) from 1983, including the optimization of substrate concentrations, the employment of NaOH, glycylglycine buffer, and sample start. Pyridoxal phosphate was applied in the assessment of ALT and AST [40]. In LURIC, GGT, ALT, AST, and alkaline phosphatase (AP) were determined using an enzymatic assay (Hitachi 717, Roche, Mannheim, Germany). Liver enzymes were measured at 25 °C. Reference values were defined as follows: GGT < 29 U/l, ALT < 23 U/l, AST < 19 U/l, AP 55–170 U/l [41]. Fatty liver index (FLI) was calculated for all the cohorts using the formula [42].
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ FLI = \left( {e^{\begin{subarray}{l} 0.953*\log e\left( {triglycerides} \right) \\ + 0.139*BMI + 0.718*\log e\left( {GGT} \right) \\ + 0.053*waist{\kern 1pt} circumference - 15.745 \end{subarray} } } \right)/(1 + e^{\begin{subarray}{l} 0.953*\log e(triglycerides \\ + 0.139*BMI + 0.718*\log e(GGT) \\ + 0.053*waist{\kern 1pt} circumference - 15.745 \end{subarray} } )*100 $$\end{document}Covariates
In the YFS 2011 and 2018 follow-ups, alcohol consumption data were acquired with standardized questionnaires and calculated in standard doses (12 g pure ethanol per day) by dividing the total number of doses consumed per week (0.33 L doses of beer or cider, 0.12 L doses of wine, and 0.04 L doses of hard liquor) by 7 [43]. In the YFS 1986 follow-up, participants were divided into those who drank any alcohol weekly and those who did not. For the KORA study, alcohol intake was also self-reported through a standardised interview by trained staff (g/day) [13]. In LURIC, information on alcohol consumption was collected using a questionnaire, which recorded the frequency of beer, wine and spirit consumption. The reported alcohol portions were turned into measures of volume as follows: one bottle of beer: 300 ml; one glass of wine: 100 ml; one measure of spirits: 30 ml (standard sizes that are common in Germany). The average German alcohol contents (5% for beer, 12% for wine, and 43% for spirits) were used to calculate pure alcohol consumption by multiplying the volume by the alcohol percentage and the weight of one millilitre of alcohol (0.8 g/ml) [44]. Smoking information was acquired with questionaries and categorized in YFS and KORA as smoking daily yes/no and current smokers yes/no in LURIC. In all cohorts, weight and height were measured during study visit and body mass index (BMI) calculated BMI = weight/(height*height).
Statistical analysis
For all analyses, GGT, ALT, AST and FLI were inversetransform normalised to account for the skewed distribution of these variables. All EWAS analyses were adjusted with age, sex, BMI, smoking, alcohol consumption, estimated cell type proportions [45] and principal components of the control probes (30 for YFS and LURIC, 10 for KORA). When significant inflation was detected, BACON [46] was used to control bias and inflation in EWAS. Meta-analysis was performed using the inverse-variance-weighted method in METAL [47], combining three discovery cohorts. Individual regression models in the YFS 2018 follow-up and the longitudinal follow-up data were adjusted with age, sex, BMI, smoking, alcohol consumption, cell type proportions, and, when suitable, array type and batch as covariates in the model. Differentially methylated regions (DMRs) were identified using the DMRcate R package [48] as described previously [49], comparing individuals with steatotic liver to those without. FDR-corrected p-values (FDR < 0.05) were considered significant in EWAS and meta-analysis, but nominal p-values were reported in the replication.
To discover genetic variants, possibly affecting the association between the methylation of discovered CpG sites and liver variables, first methylation quantitative loci (meQTLs) were investigated in the GoDMC database [50]. In addition, genetic association studies were performed in YFS 2011 to identify cis-acting genetic variants (meQTLs) within 1 Mb of the CpG sites. All genetic association analyses were performed using PLINK 2.0 [51]. Statistically significant variants (p < 5e−8) with a minor allele frequency of ≥ 0.02 which passed the Hardy–Weinberg equilibrium check (p ≥ 1e−6) and had less than 5% missingness were identified. When multiple variants met these criteria, the variant with the lowest p-value was selected as the top-associating variant. The relationship between the discovered top variant, CpG methylation and liver phenotypes was further analysed with linear regression models predicting the liver variable, including the CpG, genetic variant, age, sex, BMI, smoking, alcohol consumption and cell type proportions in the model.
Association analysis between methylation at the identified CpGs and whole blood gene expression (Expression Quantitative Trait Methylation analysis, eQTM) was performed on gene transcripts within 100 kb of the methylation locus. Target genes were identified using the biomaRt R package [52]. Linear regression analyses predicting transcript expression were adjusted for age, sex, BMI, smoking and alcohol consumption. Mediation analysis was performed if the CpG and a nearby gene, whose expression correlated with the methylation levels (nominal p < 0.05), both associated with the liver variable in question.
Mediation analysis
For each CpG methylation–gene expression pair identified according to the criteria above, a separate mediation analysis was performed to investigate whether the effect of DNA methylation on FLI was mediated through the expression levels of the nearby gene [53]. The assumed causal associations of the mediation question as two directed acyclic graphs are presented in Supplementary Fig. 1. In observational studies, certain assumptions pertaining to confounding are required. Confounding variables (confounders) are the common causes of DNA methylation, gene expression and/or FLI. These so-called exchangeability conditions are:
- No unmeasured confounding of exposure-outcome (DNAm-FLI) relationship
- No unmeasured confounding of mediator-outcome (GE-FLI) relationship
- No unmeasured confounding of exposure-mediator (DNAm-GE) relationship
- No mediator-outcome confounders that are affected by the exposure; i.e., DNA methylation cannot affect any common cause of gene expression and fatty liver index.
If these conditions are fulfilled, the individuals with certain levels of exposure and mediator can be considered as “exchangeable” conditionally on the confounders, i.e., the exposure and mediator can be treated as if they were randomized and confounding can be ruled out as the explanation of any associations observed. Thus, the mediation effects of interest delineated below can be identified based on observational data [54].
Under the exchangeability conditions above, the mediation effects came be obtained from the observed data. In practice, this was accomplished by fitting the following three linear regression models:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ FLI = \eta_{0} + \eta_{1} DNAm + \user2{ \eta }_{2}^{\user2{^{\prime}}} {\boldsymbol{C}} $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ FLI = \gamma_{0} + \gamma_{1} DNAm + \gamma_{2} GE + {\boldsymbol{\gamma}}_{{\mathbf{3}}}^{\user2{^{\prime}}} {\boldsymbol{C}} $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ GE = \beta_{0} + \beta_{1} DNAm + {\boldsymbol{\beta}}_{{\mathbf{2}}}^{\user2{^{\prime}}} \user2{C }, $$\end{document}where GE denotes the gene expression level, DNAm the DNA methylation, and C includes all covariates adjusted for in each model: age, sex, smoking and alcohol use categorized into 0, 0–1 or more than 1 drink per day. Parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\eta }_{1}$$\end{document} descibes the total effect, i.e., the effect of DNA methylation on (normalized) FLI when not accounting for gene expression, while \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\gamma }_{1}$$\end{document} corresponds to the direct effect, the effect of DNA methylation on FLI when taking gene expression into account (pathway (a) in Supplementary Fig. 1A). The indirect effect of DNA methylation on FLI via gene expression was calculated utilising the product-of-coefficients approach by multiplying the parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\beta }_{1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\gamma }_{2}$$\end{document} (pathways (c) and (d) in Supplementary Fig. 1B), and its standard error was obtained using the multivariate Delta method [55]. The proportion mediated was calculated by dividing the indirect effect with the total effect, and its standard error was also obtained with the multivariate Delta method. Mediation was inferred if the 95% confidence interval of the indirect effect did not include zero. The sample was stratified into normal-weight participants (BMI < 25 kg/m^2^, n = 524) and overweight/obese participants (BMI \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge \hspace{0.17em}$$\end{document} 25 kg/m^2^, n = 686) and mediation analyses were conducted separately for both strata.
Results
Methylation levels of cg06690548 (SLC7A11) are associated with presence of liver steatosis
In YFS 2011 (33–50 years, n = 1529), we identified that only the methylation levels of cg06690548 in the body of SLC7A11 were significantly (FDR < 0.05) different between those with and without liver steatosis (nominal p = 5.55 × 10^–9^, FDR = 0.004, regression beta = − 0.018, delta methylation beta = − 0.011, Fig. 2A). This difference in cg06690548 (SLC7A11) methylation levels was also observed after a 7- to 9-year follow-up of the same cohort (YFS 2018 G1, age 41–56 years, nominal p = 0.018, regression beta = − 0.008, delta methylation beta = − 0.012), and a similar methylation pattern was also observed in the parents of the original cohort (YFS 2018 G0, age 59–92 years, nominal p = 4.18 × 10^–13^, regression beta = − 0.016, delta methylation beta = − 0.016) (Fig. 1B). Levels of cg06690548 (SLC7A11) methylation were also slightly lower in individuals with liver steatosis in the children of the original follow-up cohort, but the association was not statistically significant (YFS 2018 G2, age 6–36 years, nominal p = 0.277, regression beta = − 0.005, delta methylation beta = − 0.001).Fig. 2. Methylation of cg06690548 (SLC7A11) in the YFS A 2011 follow-up, where cg06690548 was the only CpG which was statistically (FDR < 0.05) differently methylated in individuals with and without liver steatosis; B 2018 follow-up in the follow-up cohort (G1) and their children (G2) and parents (G0). The difference in methylation levels of cg06690548 (SLC7A11) between those with and without liver steatosis was statistically significant (p < 0.05) in G1 and G0, but not in the younger G2; C longitudinal YFS data from G1 from the 1986, 2011 and 2018 follow-ups. In the smaller subset of individuals with methylation data available from three timepoints (n = 255), methylation levels of cg06690548 (SLC7A11) were already lower in 1986 in individuals who develop MASLD by 2018. Methylation levels of this site decrease in the whole population and the median difference between those who do and do not develop liver steatosis by 2011 slightly increases when the individuals age. After adjusting with age, sex, BMI, smoking, alcohol consumption and estimated cell proportions (as all models), the differences between those who do or do not develop liver steatosis by 2018 were not statistically significant in any timepoint; D YFS 2011 follow-up stratified with alcohol consumption. Association between cg06690548 (SLC7A11) methylation and presence of liver steatosis was non-significant in individuals who reported to consume no alcohol at all and greatest in the subpopulation who reported excess alcohol consumption. *—p < 0.05; ns—p > 0.05
We then analysed the methylation level of cg06690548 (SLC7A11) in a longitudinal setting in a subset of 255 individuals from YFS with DNA methylation data available from 1986, 2011, and 2018. Those who developed liver steatosis by 2018 (n = 98) had lower levels of cg06690548 (SLC7A11) methylation across all three timepoints (delta methylation beta 1986 = − 0.006, 2011 = − 0.016 and 2018 = − 0.011), but the difference was not statistically significant (p> 0.05) in this smaller population subset at any of the timepoints (Fig. 2C). The association between cg06690548 methylation and the presence of liver steatosis remained significant in the YFS 2011 follow-up when investigating individuals who consumed alcohol moderately (≤ 350 g/week for females and ≤ 420 g/week for males^1^) or not at all (nominal p = 6.37 × 10^–4^, regression beta = − 0.004, delta methylation beta − 0.020, n = 1269). However, the association was not significant in individuals who reported not drinking alcohol (nominal p = 0.501, regression beta = − 0.011, delta methylation beta = − 0.003 n = 333). The greatest difference between individuals with and without steatosis was seen in those who consumed alcohol excessively (nominal p = 0.008, regression beta = − 0.036, delta methylation beta = − 0.051, n = 131) (Fig. 2D).
DMR analysis revealed one DMR (chr2:10,184,444–10184650) overlapping the transcription factor KLF11, with three CpGs that were hypomethylated in individuals with liver steatosis in comparison to those without (FDR = 1.63 × 10^–9^, max difference − 0.014, mean difference − 0.009).
Meta-analysis reveals systematic associations between blood DNA methylation and GGT levels and FLI
We identified CpGs associated with liver enzyme levels in three discovery cohorts (YFS 2011, KORA and LURIC) and aimed to replicate these associations again in the three-generational YFS 2018 follow-up to assess their potential longitudinal change and the effect of the age of the cohort on the discovered associations. Only cg00078197 (in the body of HGH1 homolog) and cg14141451 (in the body of scribble planar cell polarity protein SCRIB) were associated with ALT levels in the YFS 2011 follow-up (nominal p = 1.09 × 10^–8^, FDR = 0.008 regression beta = 0.003 and nominal p = 5.75 × 10^–8^, FDR = 0.022 regression beta = 0.018). No significant associations were discovered for ALT in KORA or LURIC. Similarly, no CpG sites were associated with AST levels in any of the discovery cohorts.
For GGT EWAS results, we identified significant inflation in LURIC (λ = 1.47) which was decreased but not fully controlled by the BACON correction (λ = 1.08) and thus we selected only those CpG sites that were found to be FDR-significantly associated with GGT levels in at least two cohorts for further investigation (Table 1). In the three-generational YFS 2018 follow-up, we observed that the association between cg06690548 (SLC7A11), cg26457483 (PHGDH), cg19693031 (TXNIP), and cg14476101 (PHGDH) and GGT levels were stronger in the older generations compared to the younger generations. In contrast, the association between cg06088069 (JDP2) and cg22699725 (PFKFB2) was equally strong in all generations. Interestingly, the association between cg00163198 (SNX19), cg27516100 (DHX16) and GGT levels was significant in G1 and G2, but not in the oldest G0 population (Fig. 3A).Table 1. The association between GGT levels and whole blood DNA methylation in the discovery analysis IlmnIDGeneYFS (n = 1526)KORA FF4 (n = 1872)LURIC (n = 2371)Meta-analysis (n = 5769)BetaSEP-valueFDRBetaSEP-valueFDRBetaSEP-valueFDRZscoreP-valueFDRDirectioncg06690548SLC7A11− 0.0060.0011.18E–070.014− 0.0080.0011.93E–131.34E–07− 0.0140.0011.42E–171.09E–11− 12.3833.22E–352.50E–29−−−cg26457483PHGDH− 0.0080.0025.02E–080.009− 0.0070.0012.30E–070.023− 0.0140.0013.93E–151.52E–09− 10.7873.96E–271.54E–21−−−cg19693031TXNIP− 0.0090.0021.31E–070.014− 0.0090.0013.31E–101.16E–04− 0.0100.0019.51E–117.34E–06− 10.4441.56E–254.05E–20−−−cg14476101PHGDH− 0.0090.0025.79E–080.009− 0.0080.0021.11E–070.015− 0.0120.0011.65E–121.82E–07− 10.3414.61E–258.97E–20−−−cg06088069JDP2− 0.0030.0013.12E–050.265− 0.0040.0011.56E–070.018− 0.0060.0018.53E–152.19E–09− 10.1045.32E–248.28E–19−−−cg00163198SNX190.0060.0011.98E–080.0050.0040.0019.91E–070.0520.0050.0015.22E–091.35E–049.4194.54E–215.89E–16+++cg27516100DHX160.0030.0010.0010.4570.0040.0013.28E–090.0010.0050.0011.88E–097.58E–058.8916.07E–195.91E–14+++cg12973487TCF30.0020.0010.0470.7410.0040.0014.65E–070.0320.0040.0013.47E–070.0027.168.09E–133.00E–08+++cg22699725PFKFB20.0070.0012.21E–090.0020.0050.0014.25E–070.0320.0000.0010.9931.0005.9532.63E–092.12E–05++−Only CpGs associating with GGT significantly (FDR < 0.05) in at least 2 of the cohorts are presented. All analyses are adjusted with age, sex, BMI, smoking, alcohol consumption, estimated cell proportions and technical principal components (PCs). Manhattan plot of the meta-analysis results in Supplementary Fig. 2AFig. 3The associations between DNA methylation levels in CpGs identified and A GGT and B FLI in the three generational YFS 2018 follow-up. Many of the CpGs show stronger associations with the given liver variable in the older G0 (59–93y.) and G1 (43–58y.) populations. Intriguingly, for example, the methylation of cg27516100 (DHX16) associates statistically significantly with GGT or FLI only in the young G2 and G1, and not in the G0 population. All models are adjusted with age, sex, BMI, smoking, alcohol consumption and estimated cell type proportions. NOTE X-axis of the A and B figure are not on the same scale
We further investigated the association between FLI and blood DNA methylation levels in the three discovery cohorts. Similarly, with GGT, LURIC results presented inflation (λ = 1.41), which decreased by BACON (λ = 1.07). After the correction, we discovered 23 sites that were significantly associated with FLI in at least two cohorts. Among them, eight CpGs were also associated with GGT levels in at least two discovery cohorts (Table 2). In addition, this analysis revealed associations between FLI and the methylation in ABCG1 (3 CpGs) and CPT1A (3 CpGs) (Table 2). When analysing the associations between these 23 CpGs and FLI in YFS 2018 follow-up, CpGs locating near or at TXNIP, SLC7A11, PHGDH, JDP2, PDK4, TNIK had stronger negative association in older generations, while a similar but positive pattern could be seen in CpGs locating near or at ABCG1, PHGDH (Fig. 3B).Table 2. The association between fatty liver index (FLI) and whole blood DNA methylation in the discovery analysisIlmnIDGeneYFS (n = 1526)KORA FF4 (n = 1872)LURIC (n = 2371)Meta-analysis (n = 5769)BetaSEP-valueFDRBetaSEP-valueFDRBetaSEP-valueFDRZscoreP-valueFDRDirectioncg06690548SLC7A11− 0.0140.0022.16E–092.38E–04− 0.0080.0011.93E–131.34E–07− 0.0260.0022.33E–228.99E–17− 14.1601.63E–451.27E–39−−−cg19693031TXNIP− 0.020.0032.71E–105.22E–05− 0.0090.0013.31E–101.16E–04− 0.0220.0026.93E–191.07E–13− 14.0021.52E–444.33E–39−−−cg06500161ABCG10.0120.0029.25E–122.49E–060.0010.0010.080.8280.0130.0012.17E–194.19E–1413.9951.67E–444.33E–39+++cg26457483PHGDH− 0.0160.0034.85E–080.004− 0.0070.0012.30E–070.023− 0.0280.0023.34E–242.57E–18− 13.8261.77E–433.44E–38−−−cg14476101PHGDH− 0.0170.0038.57E–080.005− 0.0080.0021.11E–070.015− 0.0250.0028.70E–222.24E–16− 13.5715.99E–429.32E–37−−−cg00574958CPT1A− 0.0080.0012.44E–131.88E–07− 0.00100.0220.74− 0.0050.0012.18E–161.68E–11− 13.0328.03E–391.04E–33−−−cg05325763CPT1A− 0.0090.0027.87E–101.21E–04− 0.0020.0010.0080.664− 0.0080.0019.41E–191.21E–13− 12.7971.72E–371.91E–32−−−cg06088069JDP2− 0.0060.0015.17E–050.216− 0.0040.0011.56E–070.018− 0.0100.0012.23E–182.46E–13− 11.2641.98E–291.92E–24−−−cg17058475CPT1A− 0.0070.0022.04E–060.048− 0.0020.0010.0050.619− 0.0060.0013.01E–141.55E–09− 11.0821.53E–281.19E–23−−−cg10128003ABCG10.010.0028.76E–080.0050.0010.0010.4140.9460.0080.0013.87E–141.86E–0910.4481.50E–258.36E–21+++cg26974062TXNIP− 0.0080.0021.86E–050.142− 0.0030.0018.98E–080.015− 0.0050.0012.48E–060.008− 9.6883.40E–221.65E–17−−−cg16740586ABCG10.010.0025.44E–070.020.0010.0010.30.9220.0120.0024.76E–121.93E–079.6544.71E–222.16E–17+++cg17075888PDK4− 0.0160.0032.68E–070.012− 0.0050.0014.17E–050.247− 0.0110.0021.50E–082.23E–04− 9.6166.83E–222.95E–17−−−cg22699725PFKFB20.0160.0029.72E–122.49E–060.0050.0014.25E–070.0320.0030.0020.0580.7449.2482.28E–207.71E–16+++cg18268027ITPR1− 0.0110.0021.97E–060.047− 0.0040.0014.28E–060.116− 0.0060.0012.45E–106.14E–06− 9.1108.22E–202.46E–15−−−cg27516100DHX160.0050.0020.0030.5160.0040.0013.28E–090.0010.0080.0011.06E–113.72E–078.8301.04E–183.01E–14+++cg20544516SREBF10.0070.0019.03E–070.0270.0010.0010.1080.8490.0080.0011.06E–113.72E–078.7621.91E–185.32E–14+++cg23320029TNIK− 0.0130.0021.12E–070.006− 0.0030.0010.0020.583− 0.0070.0011.92E–082.55E–04− 8.5809.51E–182.55E–13−−−cg02841972− 0.010.0021.21E–060.033− 0.0040.0016.29E–040.493− 0.0090.0013.17E–083.65E–04− 8.5371.38E–173.58E–13−−−cg10919522C14orf43− 0.0110.0026.68E–080.005− 0.0020.0010.0140.707− 0.0080.0011.31E–070.001− 8.4124.03E–179.79E–13−−−cg075049770.0120.0021.68E–060.0430.0020.0010.0820.830.0100.0025.03E–060.0127.8454.34E–157.67E–11+++cg12973487TCF30.0040.0020.0440.7260.0040.0014.65E–070.0320.0080.0016.79E–080.0016.9683.21E–123.09E–08+++cg00858078FOXK20.0110.0021.07E–070.0060.0020.0010.0340.7710.0050.0011.74E–050.0266.8031.03E–118.59E–08+++ Only CpGs associating with FLI significantly (FDR < 0.05) in at least 2 of the cohorts are presented. All analyses are adjusted with age, sex, BMI, smoking, alcohol consumption, estimated cell proportions and technical PCs. Manhattan plot of the meta-analysis results in Supplementary Fig. 2B
In a longitudinal setting, we analysed whether youth methylation levels (G1 1986 follow-up) of the CpGs associating with liver health variables cross-sectionally predicted 2018 GGT or FLI levels (n = 255 for those CpGs that are in 450 K and EPIC array, n = 98 for those CpGs that are only in EPIC array) No statistically significant association were found (p > 0.1 for all analysis).
Association between the CpG sites and GGT or FLI is affected by meQTLs
We first identified meQTLs using the GoDMC database, based on genetic and methylation data from over 32,000 individuals. Of the CpG sites associated with GGT, only cg19693031 (TXNIP) and cg14476101 (PHGDH) had genome-wide level meQTLs in GoDMC. This was expected, as only three of the 9 identified CpGs are present in 450 K array used in the GoDMC analyses. We identified 3 of the 9 CpG sites associated with GGT as having meQTLs in YFS (Supplementary Table 2). To analyse whether the meQTLs were the cause of the association between the CpG methylation and GGT levels, we included the most significant SNP as a cofactor in the model predicting GGT levels in YFS 2011 follow-up data. The inclusion of the SNP did not modify the associations between the CpGs and GGT levels, and none of the meQTLs discovered in YFS had independent association with GGT levels (nominal p > 0.05 for all) (Supplementary Table 2).
Of the 23 CpGs that were associated with FLI in the discovery cohorts, six and nine had a cis-meQTLs in the GoDMC database and in the YFS data, respectively. Six of the meQTLs discovered in the YFS data did not overlap with the CpGs and SNPs discovered for GGT. As with the CpGs and SNPs associated with GGT, all the CpGs that associated with FLI in YFS in the initial analysis also remained significantly associated after adjusting with the most significant meQTL (Supplementary Table 3). None of the discovered meQTLs associated with FLI levels in the YFS (nominal p > 0.05 for all).
Association between cg06500161 (ABCG1) and cg20544516 (SREBF) methylation and FLI was mediated by the expression of nearby genes
In YFS, the methylation levels of all CpGs associated with GGT, except cg06690548 (SLC7A11) and cg00163198 (SNX19), were significantly associated with the whole blood expression of genes located within 100 kb of the methylation locus. Strong negative associations were seen especially between cg27516100 (DHX16) and the expression of IER3, KIAA1949, FLOT1 and PPP1R10; cg26457483 and cg14476101 (PHGDH) and PHGDH expression; cg06088069 (JDP2) and JDP2 expression; cg22699725 (PFKFB2) and PFKFB2 expression. Positive associations were observed between cg27516100 (DHX16) and the expression of MDC1, TUBB, ABCF1, GNL1, MRPS18B and DHX16; cg12973487 (TCF3) and the expression of MBD3 and TCF3. Of the analysed transcripts expression of ZNF697, GNLI and MRPS18B associated with GGT levels with nominal significance (Supplementary Table 4).
CpG sites associated with FLI in the proximity of ABCG1 (cg06500161, cg10128003, cg16740586) and CPT1A (cg00574958, cg05325763 and cg17058475) negatively associated with the expression of multiple transcripts of the corresponding gene. Similarly, cg20544516 (SREBF) negatively associated with two distinct transcripts from this gene and cg02841972 with the expression of three KLF11 transcripts. Further negative associations were seen between cg17075888 (PDK4) and PDK4 expression; cg07504977 and SCD expression; and cg23320029 (TNIK) and TNIK expression (Supplementary Table 5). The expression of several of these transcripts also independently associated with FLI levels (Supplementary Table 5).
We further investigated whether the blood gene expression of the nearby transcripts could mediate the association between CpGs and FLI in the YFS 2011 population. In both normal-weight and overweight/obese subpopulations, mediation by the expression of nearby transcripts was observed for cg06500161 (ABCG1) (proportion mediated: 36% for both strata) and cg20544516 (SREBF) (proportion mediated: 11% and 20% for normal-weight and overweight/obese, respectively). In addition, a mediation effect was indicated only in the overweight/obese subpopulation with cg27593172 and BANK1 transcript, however, the directions of direct and indirect effect were different, which could reflect an alternative mechanism rather than gene expression regulation by DNA methylation (Supplementary Table 6, Fig. 4).Fig. 4. Nearby transcripts mediating the effect of CpG methylation on FLI in the YFS 2011 follow-up. Methylation levels of cg06500161 (ABCG1) and cg20544516 (SREBF1) on FLI were mediated by the blood expression of nearby transcripts in both the normal-weight (n = 542) and overweight/obese (n = 686) subpopulations, while mediation effect of cg27593172 (E) on BANK1 expression could only be seen in the overweight/obese subpopulation. In E, the direction of direct effect and indirect effect are different, indicating a more complex association than direct regulation of gene expression by methylation
Discussion
The presence of ultrasonically-identified steatotic liver associated only with the methylation levels of cg06690548 in SLC7A11. This finding was replicated two of the oldest generations (G1 and G0) but not in the youngest generation (G2) of the three-generational YFS 2018 follow-up. Meta-analysis revealed highly replicative findings between blood DNA methylations levels and GGT and FLI. Most of these associations were stronger in the older generations of the YFS 2018 follow-up, while only a few associated with liver variables in the youngest participating generation (age 6–36 years). Although we found several meQTLs associating with these CpGs, the association between the methylation of the CpGs and the liver variables was independent of genetic variation. We further report associations between the discovered CpGs and whole blood gene expression of nearby genes, indicating a possible biological function of the methylation changes.
Methylation at cg06690548 in SLC7A11 was the only CpG site associated with the presence of liver steatosis in YFS 2011. The methylation of this site was also strongly associated with GGT levels and FLI in the meta-analysis. Previously, methylation at this CpG has been associated with liver steatosis [18], NAFLD [19], and the levels of GGT and ALT [18] in mostly European populations. However, it did not associate with liver fat content in a Chinese population [20]. We observed that methylation levels of this CpG were already lower in individuals in their twenties who later developed liver steatosis by their fifties; however, in general, the association of this CpG and liver variables was stronger in the older population (G0). On the other hand, Melton et al. did not observe an association between cg06690548 methylation and steatosis score or NAFLD in adolescence (mean age 17 years) [56]. Although Nano et al. suggest that methylation in SLC7A11 could be causal for elevated GGT levels, our results indicate that the changes in methylation of this site are accompanied by the worsening metabolic health. In line with this hypothesis, methylation of cg06690548 has been previously associated with BMI [57–60], CRP, glucose [57, 61] and insulin levels [57, 61], T2D [62], triglyceride levels [57] and alcohol consumption [63–66]. Even though we adjusted all our analyses with age, BMI and self-reported alcohol consumption, we cannot rule out that these confounding factors could be causal for the observed changes in methylation at this site, rather than the liver condition itself. The association between liver steatosis and cg06690548 methylation levels was observed in the subpopulation of YFS who were modest alcohol consumers, but not in those who reported no alcohol consumption. These results do not rule out the possibility that the association between cg06690548 methylation and liver steatosis is influenced by alcohol consumption; however, due to the small number of individuals in analysis (n = 333) and especially the number with liver steatosis (n = 44), the lack of association in the teetotalers could also be due to limited study power.
SLC7A11 is an amino acid transporter of cysteine and glutamate and it has a well-established role in protecting cells from ferroptosis [67]. It has been implied to counteract effects of oxidative stress and maintain triglyceride metabolism in primary hepatocytes [68]. Knocking down SLC7A11 from liver cells lead to a significant decrease in lipid-associated genes and alterations in lipid metabolism, indicating that SLC7A11 has a role in maintaining lipid homeostasis in the liver [18]. Even though we, or Ma et al. [19], could not identify an association between cg06690548 and SLC7A11 expression in blood, Vallegra et al. [69] report an association utilising summary-based mendelian randomisation (SMR) analysis [70]. Negative correlation has been observed between SLC7A11 methylation and gene expression in hepatocellular carcinoma [71] and Simner et al. demonstrated that decreasing methylation via 5-AZA-dC treatment in trophoblast-derived BeWo cells increased SLC7A11 levels [72], both indicating that DNA methylation could regulate SLCA11 expression.
We discovered one DMR between individuals with and without liver steatosis in YFS 2011. Although none of the CpGs within the DMR chr2:10184444–10184650 were individually statistically significantly associated with the presence of liver steatosis or other liver health indicators investigated here, methylation of nearby cg02841972 (chr2:10176151) associated with FLI levels and with the expression of KLF transcription factor 11 (KLF11), overlapping the DMR (chr2: 10183677–10194963). KLF11 is a transcription factor that regulates glucose metabolism and insulin secretion. Rare mutations in KLF11 are thought to be causal for maturity-onset diabetes of the young type 7, while a more common variant has been associated with T2D in northern European populations [73]. Methylation levels in the body of KLF11, previously associated with insulin resistance (cg20853880 and cg05301188) [74], and cg02841972 upstream of KLF11, associated here with FLI, have previously been associated with T2D [62] and CRP [75], further linking methylation in this region with the metabolic dysfunction underlying the co-morbidity of T2D and MASLD.
In addition to cg06690548 (SLC7A11), out of the 9 CpG sites which associated with GGT levels in at least two of our discovery cohorts, blood methylation levels of cg19693031 (TXNIP) and cg14476101 (PHGDH) have been previously associated with NAFLD in adults [19], with cg14476101 (PHGDH) also associating with steatosis score and NAFLD in adolescence [56]. Methylation of cg00163198 (SNX19), measured only with the EPIC array (not 450 K), has also previously been associated with liver fat content [20].The lack of replication is partly due to the fact that only three of the discovered sites are present in Illumina 450 K array, previously used in the majority of analyses. In the three-generational YFS 2018 data, cg06690548 (SLC7A11), cg14476101 and cg26457483 (PHGDH), and cg19693031 (TXNIP) showed significant association with GGT only in the older G1 (43–58 years) and G0 (59–93 years) generations, with the strongest associations in G0. The difference is intriguing, as G1 and G0 have very similar BMIs, liver enzyme levels and prevalence of liver steatosis, and G0 had consumed less alcohol than G1. As G0 are the parents of G1, genetic differences should not offer a major explanation either. Of these, the negative association of cg19693031 (TXNIP) with liver fat content in longitudinal analysis has also been previously reported [20]. Even more perplexing is the association between cg27516100 (DHX16) and cg00163198 (SNX19) with GGT levels in the younger G2 and G1, but not in G0.
Even though meQTLs were discovered for CpGs in PHGDH (cg14476101 and cg26457483) and cg22699725 (PFKFB2), none associated with GGT levels or attenuated the association between the methylation sites and GGT. Similarly, although most of the CpGs associated with GGT also associate with the expression of nearby genes, the levels of these transcripts were not strongly associated with GGT levels. It must be noted that our gene expression data are from peripheral blood and, for example, changes in PHGDH expression in the liver have been linked to lipid-associated genes, indicating a role in liver function [76].
In addition to those CpGs associating with GGT, we discovered 15 additional CpGs associating with FLI in at least two of our discovery cohorts. Of these, cg06500161 (ABCG1) has been associated with NAFLD [19] and liver fat content [20] in a cross-sectional setting, and cg06500161 and cg16740586 (ABCG1) also in longitudinal analysis [20], while cg00574958 (CPT1A) has been associated with NAFLD in a cross-sectional setting [19]. ABCG1 facilitates cholesterol efflux to HDL, whereas CPT1A catalyses the rate-limiting step of beta-oxidation of long-chain fatty acids. Methylation of these genes has been strongly linked to obesity [57], and in previous analyses reporting their association with NAFLD and liver fat content, these associations diminished after adjusting with BMI [19, 20]. Methylation levels in SREBF1, a transcription factor regulating de novo lipogenesis, have been associated with liver steatosis in many studies [18–20, 56]. However, while these studies report an association between cg11024682 (SREBF1) and liver steatosis, we observe an association between cg20544516 (SREBF1) and FLI. Instead of liver steatosis, cg20544516 has previously been associated with T2D [62] and triglyceride levels [77], linking it closely to the development of MASLD.
In the three generations participating the YFS 2018 follow-up, most of the DNA methylation sites associating with FLI show the strongest association in the oldest G0 and the weakest association in the youngest G2 generation. Interestingly, cg22699725 (PFKFB2) and cg06088069 (JDP2), which had similar association levels with GGT in all generations, have an increasing association with FLI from the youngest to the older generations. This highlights that these indicators of liver health could be describing different aspects of the underlying pathology.
We also identified several meQTLs for CpGs associated with FLI. Similar to those associated with GGT, adding meQTL in the model predicting FLI levels did not diminish the association—and in some cases, such as cg14476101 (PHGDH1), cg17075888 (PDK4) and cg10919522 in C14orf43, adding genetic variation to the model even strengthened the association between the CpG and FLI. Unlike in the GGT analysis, transcripts associated with FLI-linked CpGs were often themselves associated with FLI. For example, the effects of cg06500161 (ABCG1) and cg20544516 (SREBF) methylation on FLI levels were mediated by the blood transcript levels of these genes. This indicates that the link between these CpG sites and FLI could be more systemic—rather than liver-specific—and that the biological link between methylation and FLI levels may, at least partly, involve regulation of nearby gene expression.
Like all studies, also ours has some limitations. Firstly, the YFS longitudinal data encompassing the 1986, 2011, and 2018 follow-ups is limited in size and consists of several batches. Therefore, we chose not to perform true longitudinal analysis but instead conducted repeated cross-sectional analyses. Liver enzyme levels were also not measured in the YFS 1986 follow-up, further limiting longitudinal analyses. The datasets utilised here consist of mainly white Europeans, and the results cannot be directly generalised to other ethnicities. Importantly, we observed some discrepancies between the Finnish datasets and the German, cg12973487 (TCF3) which associated significantly with FLI in German cohorts but not in any of the populations or settings in the Finnish YFS. From all the cohorts, the DNA methylation levels have been measured from blood leukocytes, and the associations should thus be considered to reflect the systematic components of liver steatosis and the crosstalk between the liver and the immune system. In the mediation analysis, even though the confounders deemed relevant in the mediation question were controlled for, due to the cross-sectional nature of these analyses, reverse causality cannot be fully ruled out as an explanation of the results. Thus, rather than interpreting these results through a strictly causal lens, the mediation analysis should be interpreted as a statistical decomposition of the association between DNA methylation and FLI.
Our study also has several notable strengths in comparison to previous studies. We utilise the Illumina EPIC array with 850,000 CpG sites, compared to the approximately 450,000 available on the 450 K array used in most previous studies. In our discovery analysis, we require the CpG to associate with the liver variable in at least two independent cohorts and in the meta-analysis involving over 5500 individuals. Furthermore, we are able to investigate the association between the discovered CpGs and liver variables across three different generations and age groups in the multigenerational YFG 2018 follow-up.
Conclusions
We show that in YFS, methylation at cg06690548 (SLC7A11) is lower in individuals with ultrasonically-identified liver steatosis, with this association becoming more pronounced in older individuals. We further identify overlapping groups of CpGs associating with both GGT levels and FLI, many of which also exhibit stronger associations with these liver health indicators in older YFS generations. Although several meQTLs were identified for these CpGs which are linked to liver health, these meQTLs did not affect the associations with liver-related traits, nor were they independently associated with liver health. Interestingly, some of the associations between the identified CpGs and FLI were mediated by whole blood gene expression, suggesting a possible broader systemic connection. In contrast, methylation levels at those CpGs associated with GGT rarely associated with expression of nearby genes, pointing to potentially distinct mechanisms underlying these biomarkers of liver function. Taken together, our findings contribute to a deeper understanding of the pathophysiology of liver diseases and provide insights in the systematic development of steatotic liver and provide information on the crosstalk between the liver and the circulatory immune cells.
Supplementary Information
Supplementary Material 1. Supplementary Material 2.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Holle R, Happich M, Lowel H, Wichmann HE, Group MS. KORA–a research platform for population based health research. Gesundheitswesen Bundesverband der Arzte des Offentlichen Gesundheitsdienstes (Germany). 2005; 67 Suppl 1:S 19-25.10.1055/s-2005-85823516032513 · doi ↗ · pubmed ↗
- 2Laaksonen J, Mishra PP, Seppälä I, Raitoharju E, Marttila S, Mononen N, Lyytikäinen L-P, Kleber ME, Delgado GE, Lepistö M, Almusa H, Ellonen P, Lorkowski S, März W, Hutri-Kähönen N, Raitakari O, Kähönen M, Salonen JT, Lehtimäki T. Mitochondrial genome-wide analysis of nuclear DNA methylation quantitative trait loci. Hum Mol Genet. 2021;ddab 339.10.1093/hmg/ddab 339PMC 912265335077545 · doi ↗ · pubmed ↗
- 3Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM, Kang HM, Pitsillides AN, Le Faive J, Lee S-B, Tian X, Browning BL, Das S, Emde A-K, Clarke WE, Loesch DP, Shetty AC, Blackwell TW, Smith AV, Wong Q, Liu X, Conomos MP, Bobo DM, Aguet F, Albert C, Alonso A, Ardlie KG, Arking DE, Aslibekyan S, Auer PL, Barnard J, Barr RG, Barwick L, Becker LC, Beer RL, Benjamin EJ, Bielak LF, Blangero J, Boehnke M, Bowden DW, Brody JA, Burchard EG, Cade BE, Casella · doi ↗ · pubmed ↗
- 4Suomela E, Oikonen M, Virtanen J, Parkkola R, Jokinen E, Laitinen T, Hutri-Kahonen N, Kahonen M, Lehtimaki T, Taittonen L, Tossavainen P, Jula A, Loo BM, Mikkila V, Younossi Z, Viikari JS, Juonala M, Raitakari OT. Prevalence and determinants of fatty liver in normal-weight and overweight young adults. The Cardiovasc Risk Young Finns Stud. Ann Med 2014; 1–7.10.3109/07853890.2014.96675225333756 · doi ↗ · pubmed ↗
- 5Peters TJ, Buckley MJ, Statham AL, Pidsley R, Samaras K, V Lord R, Clark SJ, Molloy PL. De novo identification of differentially methylated regions in the human genome. Epigenetics & Chromatin. 2015;8:6.10.1186/1756-8935-8-6PMC 442935525972926 · doi ↗ · pubmed ↗
