DNA methylation associated with the serum alanine aminotransferase concentration: evidence from Chinese monozygotic twins
Jingxian Li, Jia Luo, Tong Wang, Xiaocao Tian, Chunsheng Xu, Weijing Wang, Dongfeng Zhang

TL;DR
This study finds that DNA methylation at specific sites is linked to blood ALT levels in Chinese twins, suggesting epigenetic factors influence liver function.
Contribution
The study identifies novel CpG sites and genes, such as FLT4 and RELB, associated with ALT levels through epigenetic mechanisms in a Chinese population.
Findings
85 CpGs were found to be genome-wide significant, located in 16 genes including FLT4 and RELB.
Causal inference showed that 61 CpGs in 14 genes influence ALT levels.
Validation in a community sample confirmed hypomethylation and hypermethylation in CpGs linked to abnormal ALT.
Abstract
To identify nongenetic factors influences on DNA methylation (DNAm) variations associated with blood Alanine Aminotransferase (ALT) concentration, this study conducted an epigenome-wide association study (EWAS) on Chinese monozygotic twins. A total of 61 pairs of Chinese monozygotic twins involved in this study. Whole blood samples were analyzed for DNAm profiling using the Reduced Representation Bisulfite Sequencing (RRBS) technique. We examined the relationship between DNAm levels at each CpG site and serum ALT using a linear mixed-effects model. Enrichment analysis and causal inference analysis was conducted, and differentially methylated regions (DMRs) were further identified. Candidate CpGs were validated in a community sample. Genome-wide significance were calculated by Bonferroni correction (p < 2.14 × 10–7). We identified 85 CpGs reaching genome-wide significance (p < 2.14 ×…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4- —https://doi.org/10.13039/501100001809National Natural Science Foundation of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEpigenetics and DNA Methylation · Genetic Syndromes and Imprinting · Genetics and Neurodevelopmental Disorders
Background
Alanine aminotransferase (ALT) is a critical enzyme that is predominantly located in the liver, and its elevated levels are a frequent indicator of liver damage [1, 2]. In addition to indicating liver damage, elevated ALT levels also function as a predictive marker for a variety of ailments and all-cause mortality [3–5]. Consequently, it is imperative to comprehend the factors that influence the serum ALT level in order to promote human health.
As a complex trait, serum ALT concentration may be influenced by a combination of genetic and environmental factors. The extent of genetic contributions to ALT variability has been widely studied, revealing significant correlations between genetic factors and ALT levels. Previous studies found that ALT concentrations exhibit high heritability although in different races, ranging from 22 to 65% [6–11]. Moreover, the genetic pathways underlying variations in ALT levels have also begun to be uncovered by genome-wide association studies, or genome-wide association studies (GWASs). Investigations in European whites, Indian Asians [12], Mexican Americans [13], and the Japanese population [14] have pinpointed various loci and genes, including PNPLA3 and SAMM50, associated with ALT levels. Nonetheless, the previously identified genetic variants account for only a part of the genetic basis of ALT.
Recent years have provided compelling evidence of the crucial role that epigenetic mechanisms play in altering gene expression and increasing disease risk. Several epigenome-wide association studies (EWASs) have been carried out to investigate the relationship between complicated traits such as blood pressure [15], blood lipids [16], body mass index (BMI) and waist circumference [17], type 2 diabetes [18, 19], cardiovascular disease [20] and genomic DNA methylation (DNAm) variations. However, while some studies have explored the correlation between DNAm and ALT [21], studies specifically focused on East Asian populations are particularly limited. This gap is particularly important because previous studies have demonstrated differences in DNA methylation across ethnicities/races, suggesting that ethnicity/race may be an influential factor in epigenetic regulation [22–24]. Moreover, the causal relationship—whether DNAm directly affects serum ALT levels—remains unclear. Therefore, further EWAS and causal inference analyses are essential.
Both genetic and non-genetic factors can contribute to epigenetic modifications [25]. However, previous EWASs primarily utilized samples from unrelated individuals, which often overlooked confounding effects arising from diverse genetic backgrounds. Fortunately, this limitation can be addressed using monozygotic twin models, as those twins share identical genetic information. This design effectively mitigates biases associated with individual genetic variations, thereby providing a clearer understanding of the relationship between DNAm caused by external factors (such as diet, smoking, and alcohol consumption) and disease risk [26, 27].
In this study, we leveraged an ALT-discordant monozygotic twin design, where co-twins within each pair exhibited different serum ALT levels, to conduct an EWAS exploring the correlation between DNAm at specific CpG sites and ALT levels. This discordant twin approach is particularly advantageous, as it controls for shared genetics and many early environmental influences, allowing for more precise identification of epigenetic variations associated with ALT. Furthermore, we explored the potential causal relationship between DNAm and ALT levels. To validate our findings, the identified candidate CpGs were subsequently tested in a community-based population sample.
Materials and methods
Participants
Participants were recruited through the Qingdao Twin Registry [28], with recruitment details previously documented [29]. Exclusions comprised individuals with hepatitis, pregnant or breastfeeding women, and those who were unconscious or unwilling to participate. A discordant trait monozygotic twin design was employed, selecting pairs with a difference in serum ALT levels of ≥ 1 U/L. Methylation analysis was conducted on 61 pairs of monozygotic twins (122 individuals) with discordant ALT levels, where the absolute median difference in serum ALT levels within pairs was 7.00 U/L (interquartile range [IQR]: 3.00–11.00). The median age of the participants was 52 years (IQR: 46–57), and 49.2% (n = 60) were women. These demographic characteristics of the participants are summarized in Table 1.Table 1. Demographic and health information of monozygotic twinsnALT(U/L)Age (years)Gender woman (n, %)BMI (kg/ m^2^,IQR)Alcohol consumption (100 g/day, IQR)Hypertension (n, %)Diabetes (n, %)12220 (12,26)52 (46,57)60(49.2%)25 (22.40,27.48)0 (0,2)67(54.92%)12(9.84%)^IQR: interquartile ranges^
The study was approved by the Regional Ethics Committee of the Institutional Review Board of the Qingdao Center for Disease Control and Prevention, adhering to the ethical principles of the Declaration of Helsinki. All participants provided written informed consent prior to the study.
Data collection, health examination and zygosity determination
A health examination and a questionnaire were completed by each pair of cotwins. For the blood sample collection, each participant provided 10 millilitres of venous blood following an overnight 10–12 h fast. Serum and plasma were separated from blood cells and stored at − 80 °C for 30 min. A Hitachi 7600 semiautomatic analyzer from Japan was used to measure the concentrations of serum ALT and blood glucose [15, 16, 30].
Covariates and personal information
The covariates in the current study are age, gender, alcohol consumption, body mass index, hypertension status, and diabetes status, as determined by previous research. A summary of these covariates is provided in Table 1.
Questionnaires were distributed at the local Qingdao CDC service center or at community hospitals/clinics to gather personal information, including age, sex, and alcohol consumption (as measured by the question "Currently, what is your average daily alcohol consumption?"). The formula weight (kg)/height (m)^2 was employed to determine BMI from the measured height and weight data. Hypertension was defined as a measured systolic blood pressure exceeding 140 mmHg, a diastolic blood pressure exceeding 90 mmHg, or self-reported hypertension. Self-report or fasting blood glucose levels exceeding 7 mmol/L were used to ascertain diabetes status.
Reduced representation bisulfite sequencing (RRBS) experiment
Total DNA was isolated from the whole blood sample in the RRBS experiment. In brief, genomic DNA was digested using a restriction enzyme to generating short fragments with CpG-rich regions. Subsequently, bisulfite conversion was performed to analyze DNA methylation patterns. Raw methylation data spanning 551,447 CpGs per participant's genome was subsequently obtained by constructing and sequencing cDNA libraries. Bismark [31] was employed to align the methylation data to the human GRCh37 reference genome, and BiSeq [32] was employed to normalize the data, with coverage limited to the 90th percentile. The quality control process excluded CpGs with a mean methylation β-value < 0.01 or > 0.99 or more than 10 missing observations, resulting in 233,720 CpGs that were relevant to ALT for further analysis. Methylation β-value was log2-transformed into M-value for statistical analysis. Given that DNA was sourced from whole blood, accounting for cell type heterogeneity was crucial to avoid potential biases [33]. To mitigate this, we employed the ReFACTor method, utilizing the top five components to adjust for cell type composition effects on DNAm [34].
Epigenome‑wide association analysis (EWAS)
Through the utilization of the lmer function in the R-package lmerTest, the linear mixed-effects model was implemented to assess the correlation between DNAm M-values (log2-transformed methylation β-values) at each CpG site and ALT levels. In this model, M-values were included as fixed-effect independent variables, while twin pair IDs were incorporated as a random effect to account for the paired structure of the twin data. This model adjusted for covariates including age, gender, alcohol consumption, BMI, hypertension status, diabetes status, and cell type composition. To account for the paired structure of twin data, a random effect was included to identify twin pairs within the linear mixing model. Bonferroni correction was applied to adjust for multiple testing of 233,720 CpGs, setting a genome-wide significance threshold at p < 2.14 × 10^–7^ (Bonferroni correction p value < 0.05) [35]. The R package biomaRt was used to annotate CpGs with p < 0.05 to the nearest gene [36, 37].
Power estimation for EWAS
Given the lack of a standardized sample size formula for discordant monozygotic twin designs, we estimated statistical power through simulation-based approaches, which evaluated the statistical power of EWAS using a disease-discordant twin design [26]. The study demonstrated that both higher phenotypic heritability (h2) and a stronger correlation between DNA methylation and environmental factors (R2_M,E_) lead to a smaller required sample size to achieve adequate power, assuming other factors remain constant.
To adopt a conservative approach, we selected R2_M,E_ = 0.1, as a lower DNA methylation–environment correlation results in a larger estimated sample size. For h2, while a lower value would be more conservative, we set h2 = 0.6, which is slightly lower than the previously reported ALT heritability (h2 = 0.65) in the Chinese population [11]. This choice ensured that our power estimation remained biologically plausible while accounting for potential variability.
Under an extreme case where the intra-pair correlation due to shared genetics or environment ρε = 0.1, the simulation results indicated that a sample size of 63 pairs would be sufficient to achieve 80% power. This suggests that a sample size of approximately 60 twin pairs is adequate for reaching the desired statistical power in disease-discordant twin EWAS studies. Therefore, we estimated that our study, which included 61 twin pairs, will achieve approximately 80% statistical power.
Causal inference analysis
To assess the potential causal relationship between serum ALT levels and CpGs identified at the epigenome-wide significance level (p < 2.14 × 10⁻⁷), we applied the Inference about Causation through Examination of Familial Confounding (ICE FALCON) method. ICE FALCON is a regression-based approach specifically designed for causal inference in family-based studies, particularly in twin studies, as it accounts for both genetic and shared environmental confounders [38, 39].
Based on this method, we fitted three regression models to the twin-pair data:
Model 1: The outcome variable (serum ALT level) of each twin was regressed on their own predictor variable (CpG methylation) to estimate the unconditional regression coefficient “βself”, which captures both the causal effect and familial confounding.
Model 2: The outcome variable of each twin was regressed on their co-twin’s predictor variable to estimate “βco-twin”, which reflects the influence of familial confounding alone.
Model 3: The outcome variable was regressed on both the twin’s own predictor and their co-twin’s predictor to obtain the conditional regression coefficients “β’self” and “β’co-twin” Unlike “βself” and “βco-twin”, these coefficients indicate the change in outcome associated with a change in the predictor while keeping the other predictor constant, allowing a more refined causal interpretation. The regression models for the ICE FALCON method are presented in Fig. 1.Fig. 1. Manhattan plot of the effects of EWASs on ALT levels
The criteria for causal inference were as follows:
If the absolute difference between |βco-twin—β’co-twin| is similar to |βself—β’self|, then the observed association is likely due to familial confounding rather than causality.
Conversely, a causal effect is indicated if the absolute value of the ratio: |βco-twin—β’co-twin|/ |βself—β’self| (absolute value of ratio) is greater than 1.5, suggesting that within-pair differences in CpG methylation correspond to within-pair differences in ALT levels, independent of familial confounding.
To estimate the parameters, we employed a generalized estimating equations model, which accounts for the correlation structure within twin pairs. The rationale behind these statistical criteria has been supported by previous methodological studies [38, 40].
Region‑based analysis
The comb-p was implemented to identify differentially methylated regions (DMRs) that are linked to ALT [41]. A region was considered significantly enriched with DMRs if the Stouffer-Liptak-Kechris (slk) corrected p-value was less than 0.05.
Ontology enrichments analysis
The ontology enrichment of identified CpGs (p < 0.05) was evaluated using the online instrument Genomic Regions Enrichment of Annotations instrument (GREAT) [42].
Annotations were conducted using the human GRCh37 genome reference and the default "basal plus extension" association rule. An ontology was considered statistically significant if its Bonferroni correction p value was less than 0.05.
Validation using independent sample
We selected genes for quantitative methylation analysis to validate our EWAS results based on following criteria: 1) results of top signals in EWAS associated with our target trait; 2) the gene was involved in important biological function and pathways that may potentially influence; 3) the gene was where the CpGs having causal relationship with ALT in the causal inference analysis were located; 4) the primers of the gene could be successfully designed.
Blood collection and preservation procedures were consistent with previous methods. Validation of DNAm for the top CpGs was performed using matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry. Mass spectra were obtained using a MassARRAY Compact MALDI-TOF system (Sequenom; BioMiao Biological Technology, Beijing, China), and methylation ratios were generated with the EpiTYPER software (Sequenom, San Diego, CA). Multiple primer schemes were designed to cover regions containing the top CpGs, with final primer optimization carried out using Epidesigner software. The EpiTYPER™ software system facilitated quantitative DNAm analysis, and the mean methylation status of closely adjacent CpGs was computed.
To validate our EWAS results, we conducted an independent replication analysis in a community-based cohort from Qingdao, China. Participants were included if they were aged ≥ 40 years and had available data on ALT levels, BMI, alcohol intake, diabetes status, and blood pressure. Individuals with active hepatitis and pregnant or breastfeeding women, were excluded from this cohort. The validation cohort consisted of 54 individuals with abnormal ALT levels (ALT > 40 U/L) and 162 matched controls, selected using propensity score matching (PSM) to minimize potential confounding effects. Cases and controls were matched at a 1:3 ratio based on age and sex, using nearest-neighbor PSM with a caliper value of 0.1. This method ensures that the matched individuals are comparable in baseline characteristics, improving the validity of our findings. The effect of DNAm to ALT was evaluated using the conditional logistic regression model, which was adjusted for match group, BMI, alcohol intake, diabetes, and hypertension status.
Results
Epigenome-wide association analysis
Sixty-one pairs of monozygotic twins were included in the methylation study; their median serum ALT value was 20 U/L (IQR: 12–26). The demographic and clinical characteristics of the participants are summarized in Table 1. The median age was 52 years (IQR: 46–57), and 49.2% (n = 60) were women. The median ALT level was 20 U/L (IQR: 12–26), and the median BMI was 25.0 kg/m^2^ (IQR: 22.40–27.48). Among the participants, 54.92% (n = 67) had hypertension, and 9.84% (n = 12) had diabetes. The median alcohol consumption was 0 g/day (IQR: 0–2).
Figure 2 displays the Manhattan plot for the EWAS on ALT levels. We identified 85 ALT-related CpG sites with p values less than 2.14 × 10^–7^. Table 2 displays the top 35 significant CpG sites, while all 85 significant CpG sites are detailed in Table S1.Fig. 2. Differentially Methylated Regions (DMRs) associated with serum ALT LevelsTable 2Top 35 ^a^Significant CpG Sites Associated with ALT LevelsChromosomePositionp valueβGeneGene function^b^chr5180,046,2271.26E-10− 0.089FLT4A tyrosine kinase receptor involved in lymphangiogenesis and endothelial maintenancechr5180,046,2241.26E-10− 0.089FLT4chr5180,046,2351.38E-10− 0.089FLT4chr5180,046,2401.47E-10− 0.089FLT4chr5180,046,2441.53E-10− 0.089FLT4chr5180,046,2491.60E-10− 0.089FLT4chr5180,046,2581.91E-10− 0.088FLT4chr5180,046,2702.22E-10− 0.088FLT4chr5180,046,2752.30E-10− 0.088FLT4chr5180,046,2902.85E-10− 0.089FLT4chr5180,046,3054.79E-10− 0.089FLT4chr101,517,2776.88E-10− 0.084ADARB2An RNA-editing enzyme involved in RNA regulationchr101,517,2581.35E-09− 0.086ADARB2chr1320,139,1212.16E-09− 0.070MRPS31P2–chr101,517,2422.56E-09− 0.087ADARB2An RNA-editing enzyme involved in RNA regulationchr101,517,2373.26E-09− 0.088ADARB2chr101,517,3073.98E-09− 0.082ADARB2chr1945,532,1794.21E-090.066RELBA transcription factor involved in immune regulation and lymphocyte differentiationchr101,517,3108.07E-09− 0.081ADARB2chr101,517,2258.79E-09− 0.089ADARB2chr101,517,2199.89E-09− 0.089ADARB2chr449,163,5021.14E-080.063–chr1572,490,4791.24E-080.067–chr449,163,5091.27E-080.063–chr1945,532,1701.28E-080.064RELBchr449,163,5191.45E-080.063–chr1945,532,2081.47E-080.080RELBchr1945,532,2011.52E-080.079RELBchr1945,532,2141.52E-080.079RELBchr1945,532,1961.77E-080.079RELBchr12,928,7151.87E-080.066–chr1945,532,2352.39E-080.080RELBchr101,517,3132.39E-08− 0.081ADARB2chr12,928,7482.69E-080.066–chr12,928,7333.11E-080.066–^a^The 35 CpGs represent the top significant sites based on the linear mixed-effects model results, with the full list available in the Table S1. ^b^Gene function information from the National Center for Biotechnology Information (NCBI) https://www.ncbi.nlm.nih.gov/
The 11 strongest associations (β = − 0.089 to − 0.088, p = 1.26 × 10^–10^ to 4.79 × 10^–10^) were found for the CpG sites located on chromosome 5 (180,046,224 to 180,046,305 bp) within the FLT4 gene, all of which presented significant statistical associations (p < 2.14 × 10^–7^).
These top CpG sites were located within 16 genes, including FLT4, ADARB2, MRPS31P2, RELB, DACT2, MATK, ZBTB7B, ENSG00000259268, ENSG00000240093, VAX1, OPLAH, PCNT, PLCB3, WDR20, LY6E, and PAX7. These genes were also found near 10 additional genes (refer to Tables 2 and S1).
Causal inference analysis
Table 3 displays the causal inference analysis results for the top 35 CpGs. The comprehensive results for the top CpGs (n = 85, p < 2.14 × 10^–7^) are detailed in Table S2. The findings strongly support a causal effect of DNAm on ALT levels for 61 CpGs located within 14 genes: FLT4, MRPS31P2, ADARB2, RELB, DACT2, MATK, ZBTB7B, ENSG00000259268, VAX1, OPLAH, PCNT, PLCB3, LY6E, PAX7, and nearly 7 other genes. Notably, among these 61 significant CpGs, only one CpG located in PCNT (chr21, position: 47,808,888), which is located in the PCNT gene, had a causal effect on the serum ALT level in DNAm.Table 3. Causal Inference Analysis Results for the Top 35 CpG SitesChromosomePositionGeneMethylation to ALTALT to methylationβself changepself changeβco-twin changep_co-twin change_AVRβself changepself changeβco-twin changep_co-twin change_AVRchr5180,046,227FLT4− 0.3250.117− 1.1710.0063.5993.52E-040.9560.0420.128–chr5180,046,224FLT4− 0.3270.112− 1.1660.0063.5714.12E-040.9480.0420.128–chr5180,046,235FLT4− 0.3240.123− 1.1740.0063.6293.15E-040.9600.0420.125–chr5180,046,240FLT4− 0.3230.128− 1.1780.0063.6512.68E-040.9660.0420.124–chr5180,046,244FLT4− 0.3210.132− 1.1820.0073.6811.98E-040.9750.0420.123–chr5180,046,249FLT4− 0.3190.139− 1.1890.0073.7298.33E-050.9890.0420.123–chr5180,046,258FLT4− 0.3110.154− 1.1950.0083.843− 2.24E-040.9720.0410.130–chr5180,046,270FLT4− 0.3040.164− 1.2100.0083.981− 0.0010.9170.0410.143–chr5180,046,275FLT4− 0.3010.168− 1.2180.0084.043− 0.0010.8960.0410.148–chr5180,046,290FLT4− 0.2990.153− 1.2280.0054.109− 0.0010.8500.0400.160–chr5180,046,305FLT4− 0.2910.136− 1.1930.0034.104− 0.0020.8290.0400.172–chr101,517,277ADARB2− 0.3020.468− 1.0950.113–0.0040.5880.0510.083–chr101,517,258ADARB2− 0.3030.401− 1.0840.074–0.0030.6520.0500.092–chr1320,139,121MRPS31P2− 0.552 < 0.001− 1.727 < 0.0013.128− 0.0020.7230.0370.179–chr101,517,242ADARB2− 0.3180.293− 1.0970.0343.4460.0030.7020.0500.094–chr101,517,237ADARB2− 0.3220.254− 1.1050.0253.4340.0020.7420.0500.095–chr101,517,307ADARB2− 0.3410.276− 0.9920.0222.9050.0070.3460.0500.083–chr1945,532,179RELB0.3870.1491.8530.0074.7830.0020.684− 0.0340.115–chr101,517,310ADARB2− 0.3540.218− 0.9880.0122.7910.0070.3320.0490.086–chr101,517,225ADARB2− 0.3370.140− 1.1160.0103.3070.0020.8270.0500.095–chr101,517,219ADARB2− 0.3650.104− 1.1790.0043.2290.0010.8500.0510.091–chr449,163,502–0.3810.6891.5240.498–0.0010.871− 0.0300.267–chr1572,490,479–0.8090.0692.6420.0023.2660.0110.112− 0.0480.089–chr449,163,509–0.3490.7441.4720.554–0.0010.876− 0.0290.285–chr1945,532,170RELB0.4250.1001.7550.0104.1240.0020.758− 0.0320.141–chr449,163,519–0.3160.7881.3850.607–0.0010.898− 0.0280.306–chr1945,532,208RELB0.2320.3451.4560.0256.2900.0030.680− 0.0360.123–chr1945,532,201RELB0.2520.2591.5370.0116.1020.0040.626− 0.0370.113–chr1945,532,214RELB0.2260.3621.4470.0276.4000.0030.690− 0.0360.125–chr1945,532,196RELB0.2520.2501.5430.0096.1220.0040.611− 0.0370.115–chr12,928,715–0.5500.2141.6280.0252.960− 0.0010.806− 0.0390.150–chr1945,532,235RELB0.1950.4491.3660.0296.9980.0030.733− 0.0350.130–chr101,517,313ADARB2− 0.3760.132− 0.9860.0032.6190.0070.3100.0490.091–chr12,928,748–0.6030.0661.744 < 0.0012.890− 9.21E-050.987− 0.0400.153–chr12,928,733–0.5810.1341.7110.0042.947− 4.83E-040.929− 0.0400.152–AVR: Absolute value of ratio, AVR =|βco-twin change| / |βself change|, AVR > 1.5 indicates a causal effect
Region-based analysis
Fifty-two DMRs were found to be associated with serum ALT levels. Table 4 provides information on a portion of significant DMRs, while the details for all 52 DMRs are included in Table S3.Table 4. Summary of Partial Significant Differentially Methylated Regions (DMRs) Associated with Serum ALT LevelsChromosomeStart (bp)End (bp)LengthStouffer–Liptak–Kechris (slk) corrected p-valueGeneNearest genechr5180,045,726180,046,393281.53E-11FLT4chr101,517,1521,517,314101.73E-10ADARB2chr449,163,42749,163,595115.50E-09–ENSG00000222437chr123,280,00223,280,168218.93E-08LACTBL1chr1945,531,92145,532,286151.72E-07RELBchr1632,289,93932,290,060141.79E-07–ENSG00000259822chr1043,428,61143,429,087207.57E-07–ENSG00000229630chr2130,975,639130,975,784122.66E-06–RHOQP3chr2186,794,503186,794,653127.97E-05–RPL21P32chr1914,584,40414,584,550181.59E-04PTGER1chr12,491,1302,491,396131.73E-04TNFRSF14chr1746,645,89246,646,045103.50E-04HOXB3, HOXB-AS3
As shown in Fig. 3, some DMRs presented varying correlations with the serum ALT level. For example, Fig. 3A shows a DMR located in the FLT4 gene, which is negatively correlated with serum ALT. Conversely, Fig. 3E depicts a DMR located in the RELB gene, which is positively correlated with the serum ALT level.Fig. 3. Potential mechanisms by which FLT4 and RELB genes influence serum ALT levels
Among the 52 identified DMRs, 24 exhibited positive correlations with serum ALT levels, another 21 showed negative correlations, and the association trend for the remaining 7 DMRs remained uncertain. The complete set of 52 DMRs is displayed in the supplementary materials (Fig. S1).
Ontology enrichment analysis
A total of 16,109 genomic cis-regulatory regions associated with various genes were identified from the EWAS results of serum ALT according to GRCh37/hg19 (Fig. S2a). Figs. S2b and c display the absolute distance and orientation of the genomic cis-regulatory areas with respect to the transcription start site (TSS).
Multiple pathways linked to serum ALT levels were found in the pathway and biological process enrichment studies, including low voltage-gated calcium channel activity and focal adhesion. Partial results are presented in Table 5, while all results with Bonferroni correction p < 0.05 are detailed in Table S4.Table 5. Partial pathways and biological processes enriched for serum ALT levelsOntology databaseTerm namep valueBonferroni correction p valueFold enrichmentObserved region hitsRegion set coverageGO molecular functionlow voltage-gated calcium channel activity2.54E-999.37E-9629.084930.006GO molecular functionDNA binding3.27E-871.21E-831.30446010.283GO biological processgene expression9.30E-819.71E-771.24257260.352GO biological processRNA metabolic process2.71E-802.83E-761.25154620.336GO molecular functionthromboxane A2 receptor activity1.19E-784.38E-75149.335440.003GO biological processthromboxane A2 signaling pathway1.19E-781.24E-74149.335440.003GO biological processRNA biosynthetic process2.64E-782.75E-741.27947150.290GO biological processcortisol biosynthetic process2.59E-772.70E-7338.055650.004GO biological processaldosterone biosynthetic process2.59E-772.70E-7338.055650.004GO biological processtranscription, DNA-dependent5.02E-775.24E-731.27946450.286GO biological processmitral valve formation4.44E-754.64E-7136.577640.004GO biological processnucleic acid metabolic process5.21E-725.44E-681.21560530.372Human phenotypeGenu recurvatum9.34E-725.74E-687.2401430.009Human phenotypeDifficulty climbing stairs3.01E-711.85E-676.8631480.009GO biological processorganic cyclic compound biosynthetic process8.91E-709.30E-661.23952220.321GO molecular functionnucleic acid binding1.89E-686.96E-651.22256420.347GO biological processnucleobase-containing compound biosynthetic process2.07E-672.16E-631.24549510.305GO biological processmacromolecule metabolic process1.89E-661.98E-621.12796490.594GO biological processmacromolecule biosynthetic process8.57E-668.95E-621.21158150.358GO biological processcellular metabolic process2.75E-652.87E-611.10810,8060.665GO biological processcellular nitrogen compound biosynthetic process1.09E-641.14E-601.23450760.312Human phenotypeUrethral stricture4.80E-642.95E-6074.524430.003Human phenotypeSkin fragility with non-scarring blistering4.80E-642.95E-6074.524430.003Human phenotypeOnychogryposis of toenails4.80E-642.95E-6074.524430.003GO biological processaromatic compound biosynthetic process6.21E-646.48E-601.23450360.310GO biological processregulation of macromolecule biosynthetic process2.36E-632.47E-591.19163570.391GO biological processmetabolic process2.65E-632.77E-591.09511,5790.713GO molecular functionsequence-specific DNA binding2.97E-621.10E-581.42921710.134GO biological processregulation of RNA biosynthetic process3.17E-623.31E-581.20059420.366GO biological processcellular macromolecule biosynthetic process4.78E-624.99E-581.20856780.349BioCyc pathwayalanine biosynthesis II1.43E-104.76E-0814.046120.001BioCyc pathwayalanine degradation1.43E-104.76E-0814.046120.001MSigDB pathwayFocal adhesion3.44E-054.55E-021.2104560.028
Results of validation
We selected candidate genes in accordance with the criteria provided in the ‘Methods’ section. Ultimately, we quantified 9 CpGs in the FLT4 gene (originally 11, but 2 were not detected) and 7 CpGs in the RELB gene via the Sequenom MassARRAY platform. The validation cohort consisted of 216 individuals, comprising 54 cases and 162 controls. The baseline characteristics of the validation cohort are summarized as follows: In the ALT normal group, 102 participants were women, accounting for 63% of the group, with a median age of 49.5 years (IQR: 46–55). Similarly, in the ALT abnormal group, 34 participants were women (63%), with a median age of 49.5 years (IQR: 47–57).
There is statistically significant evidence (p < 0.05) of a substantial negative connection between serum ALT and four of the top CpGs in the FLT4 gene. Furthermore, a significant positive relationship (p < 0.05) between the serum ALT level and three of the top CpGs in the RELB gene was confirmed. The results are presented in Supplementary Table S5.
Discussion
To our knowledge, no previous methylation analyses of serum ALT have focused on the Chinese or East Asian population, highlighting a significant gap in the literature. This study investigated the epigenetic factors influencing serum ALT levels in the Chinese population by conducting an EWAS on a cohort of monozygotic twins. Our analysis revealed that 85 CpG sites were significantly associated with serum ALT levels. Additionally, we identified multiple genes, DMRs, and pathways that may elucidate the mechanisms underlying changes in the serum ALT level. For further validation, several candidate CpGs located in the FLT4 and RELB genes were quantified and confirmed. The insights garnered from our analyses offer significant contributions to understanding the epigenetic regulation of liver function, and provides a theoretical basis for future investigations into whether the methylation status of these genes could serve as early diagnostic biomarkers for liver dysfunction.
While previous studies have focused on European cohorts [21], genetic background and environmental exposures differ across populations, potentially influencing DNA methylation patterns. Our study contributes novel insights into the epigenetic regulation of ALT levels in an East Asian population, an area that has been underexplored. Future studies in diverse populations are warranted to determine whether the identified epigenetic associations with ALT are generalizable across ethnic groups through EWAS or DNA methylation score [43], which can further expand epigenetic studies to other populations to gain a more comprehensive understanding of liver function regulation.
Among the CpG sites that ranked at the top of Table 2, the top 11 were located within the FLT4 gene. The FLT4 gene encodes VEGFR3, known for its involvement in the vascular endothelial growth factor (VEGF) and vascular endothelial growth factor receptor (VEGFR) signaling pathways [44], which is crucial for lymphangiogenesis, vascular remodeling, and hepatic microcirculation homeostasis [45], emerged as a critical locus in our study. Several animal studies have suggested mechanisms through which FLT4 may influence ALT levels. One study suggested that VEGF might stimulate sinusoidal endothelial cells to produce cytokines such as hepatocyte growth factor. This process increases the expression of Bcl-xL in hepatocytes, thereby reducing apoptosis. Researchers have therefore postulated that via intercellular contacts, VEGF exerts a strong antiapoptotic effect on hepatocytes [46].
In addition, FLT4 is involved in the focal adhesion pathway, a crucial regulator of cell adhesion, migration, and tissue remodeling. Our GREAT binomial test results (Table 5) indicated significant enrichment of this pathway, suggesting its potential link to serum ALT levels. The FLT4 gene participates in the generation of focal adhesion kinase (FAK), which has been implicated in hepatic injury response and fibrosis progression [47, 48]. Moreover, studies have shown that VEGF interacts with nuclear factor erythroid 2-related factor 2 (Nrf2), inducing the Nrf2 signaling pathway to counteract oxidative stress. Through a positive feedback loop, VEGF further enhances Nrf2 activation and promotes its own expression, thereby protecting cells from oxidative damage [49]. This mechanism may contribute to hepatocyte protection against oxidative stress-induced injury, ultimately influencing serum ALT levels. These findings suggest that FLT4 may contribute to liver function regulation through VEGF, FAK signaling, and Nrf2. However, further studies are needed to clarify whether FLT4 methylation directly modulates these pathways in hepatocytes or sinusoidal endothelial cells.
Among the top CpGs, seven sites are located on the RELB gene. RELB belongs to the family of mammalian nuclear factor kappa-B (NF-κB) [50]. Many biological processes, including the immunological response, inflammatory responses, cell proliferation and survival, and development, are influenced by NF-κB [51, 52]. Consequently, hepatocyte damage and subsequent changes in serum ALT levels may be caused by the NF-κB signaling pathway, which may be involved in the biological processes of liver inflammation. Additionally, studies have shown that the use of NF-κB inhibitors can suppress RELB expression, thereby alleviating liver injury [53]. Moreover, RELB may interact with the aryl hydrocarbon receptor (AhR) to regulate the transcription of chemokines, thereby contributing to the inflammatory response [54]. These findings further suggest that RELB might influence serum ALT levels by impacting liver damage. To better illustrate the potential mechanisms through which FLT4 and RELB might influence ALT levels, Fig. 4 provides a visual summary of the key pathways involved.Fig. 4. Schematic illustration of the potential mechanisms by which FLT4 and RELB influence serum ALT levels
Some CpG sites may exhibit co-methylation patterns within specific genomic regions. To account for such regional effects, we performed DMR analysis, which confirmed significant associations between DNAm and serum ALT levels across different genomic regions where the top CpGs are located, including the sites in FLT4 and RELB. While DMRs help capture biologically relevant patterns, the dynamic nature of DNA methylation differs from the stable linkage disequilibrium (LD) structures seen in GWAS. Additionally, the absence of a standardized LD-equivalent reference for methylation makes fine mapping in EWAS particularly challenging. Future studies incorporating large-scale methylation datasets and co-methylation network analyses may provide further insights into these regional methylation effects.
The results of causal inference consistently supported the directional effect of the majority of top CpGs on ALT levels. Furthermore, validation via the Sequenom MassARRAY platform in an independent cohort confirmed our initial findings, with significant correlations observed for CpGs in FLT4 and RELB, which was consistent with the direction of the initial analysis. These additional layers of validation affirm the robustness and reliability of our conclusions.
The solitary CpG site (chr21, position: 47,808,888) in the PCNT gene showing a reverse causal effect, where ALT levels influence DNAm, is intriguing and warrants further investigation.
Using gene set enrichment analysis, significant gene sets associated with reduced voltage-gated calcium channel activity were found in genes linked to CpGs with low p values. Although direct research is limited, several studies have shown that calcium channel blockers can protect the liver. For example, a randomized controlled experiment showed that acute alcoholic hepatitis might be treated with amlodipine, a calcium channel antagonist [55]. Furthermore, R.J. Nauta et al. reported that in a model of ischemia‒reperfusion-induced liver injury, significant mitochondrial calcium loading and an increase in calcium and oxygen-derived free radicals during ischemia might cause mitochondrial damage in a model of ischemia‒reperfusion-induced liver injury [56]. These studies suggest that calcium channel activation may be associated with liver injury, subsequently affecting serum ALT levels.
This study has several noteworthy advantages. First, MZ twins share identical genetic material, which significantly reduces genetic confounding when studying epigenetic modifications [27]. This unique feature allows for a more accurate assessment of environmental influences on DNA methylation and serum ALT levels without the interference of genetic variability. Second, by performing causal inference analysis and validating the findings in an independent cohort, we were able to confirm that DNAm changes at most of the top CpGs have a directional effect on ALT levels. This combination of causal inference and validation strengthens the evidence for the potential causative role of these epigenetic modifications. Another key strength of our study is the incorporation of an independent community-based validation cohort, which enhances the robustness and potential generalizability of our findings. By applying PSM, we minimized confounding effects related to age and sex, strengthening the validity of our DNAm-ALT associations. These strengths collectively highlight the robustness and applicability of our findings for understanding the epigenetic regulation of liver function.
Although our study offers valuable insights, it has several limitations. The recruitment and identification of eligible twins presented considerable challenges, resulting in a relatively restricted sample size. While the trait-discordant twin design enhances statistical power compared to conventional cross-sectional or case‒control studies, the relatively small sample size may still limit the ability to detect subtle DNAm-ALT associations. Additionally, despite our previous research demonstrating a high heritability of serum ALT levels (65%) [11] and an estimated statistical power of approximately 80% with over 60 twin pairs [26], a larger sample size could further strengthen the robustness of our findings. Moreover, as our study population was limited to Chinese individuals, the generalizability of our results to other ethnic groups remains uncertain. Future studies with larger and more diverse populations are needed to validate these associations and explore the underlying biological mechanisms. In addition, while our study focused on ALT as a key marker of liver function, future research could benefit from a broader exploration of liver enzymes such as aspartate aminotransferase (AST) and gamma-glutamyl transferase (GGT) to further elucidate the epigenetic mechanisms underlying liver health. As part of our ongoing research, we plan to expand our investigation to include these biomarkers, which may provide additional insights into the epigenetic regulation of liver function. Finally, although we performed validation in a community-based population, further validation in larger independent cohorts or through functional experiments would strengthen our findings.
Conclusion
In conclusion, our work establishes a basis for further investigation into the epigenetic processes underlying liver health and highlights the critical role that DNAm plays in controlling serum ALT levels. The identification of specific CpGs in FLT4 and RELB associated with ALT levels offers promising avenues for developing epigenetic biomarkers and therapeutic interventions for liver injury. The robustness of our findings, supported by DMR analysis, causal inference, and external validation, underscores the potential of DNAm as a key player in the regulation of liver function.
Supplementary Information
Additional file 1.Additional file 2.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1KEGG. Focal adhesion - Reference pathway 2023 [updated 1/25; cited 2024]. Available from: https://www.kegg.jp/pathway/map 04510.
