Estimating population structure using epigenome-wide methylation data
Ziqing Wang, Kent D Taylor, Jerome I Rotter, Stephen S Rich, Yinan Zheng, Lifang Hou, Xiuqing Guo, Jan Bressler, Laura M Raffield, Yongmei Liu, Robert Kaplan, Donald M Lloyd-Jones, Alanna C Morrison, Myriam Fornage, Bruce M Psaty, Jennifer A Brody, Tamar Sofer

TL;DR
This paper introduces a new method to estimate population structure using DNA methylation data, which can help improve the accuracy of epigenome-wide association studies.
Contribution
The novel contribution is the development of methylation population scores (MPSs) to predict genetic principal components using supervised learning.
Findings
MPSs showed strong correlation with genetic principal components (GPCs), with R² ranging from 0.27 to 0.98.
MPSs outperformed alternative methylation-based methods in differentiating self-reported racial/ethnic groups.
MPSs reduced inflation in EWAS similarly to GPCs and can be used when genetic data are unavailable.
Abstract
Population stratification is one of the source of inflation in epigenome-wide association studies (EWAS) when not properly accounted for. To address this, we developed methylation population scores (MPSs) to predict genetic principal components (GPCs) using a feature selection approach. We used multi-ethnic DNA methylation data from Illumina EPIC arrays across five cohorts, including MESA (n = 929), CARDIA (n = 1123), JHS (n = 1365), ARIC (n = 2338), and HCHS/SOL (n = 1475), randomly splitting participants into training (85%) and test (15%) sets. Within each cohort, associations between GPCs and CpG sites were estimated using linear regression adjusting for age, sex, smoking and alcohol use, race/ethnicity, body mass index, and cell type proportions, followed by meta-analysis and selection of CpGs with FDR <0.05. We then applied a two-stage weighted least squares Lasso regression to…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEpigenetics and DNA Methylation · Genetic Associations and Epidemiology · Health, Environment, Cognitive Aging
