Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements
Weiwei Zhang, Tim D Spector, Panos Deloukas, Jordana T Bell, Barbara E, Engelhardt

TL;DR
This study develops a highly accurate random forest classifier to predict genome-wide DNA methylation at single CpG sites using genomic features, surpassing previous methods and revealing key regulatory elements involved in methylation regulation.
Contribution
We introduce a novel classifier that predicts CpG methylation levels genome-wide with high accuracy and identify key genomic features influencing methylation.
Findings
Achieves 91-94% accuracy at single CpG site level
Predicts 98% accuracy within CpG islands
Identifies genomic features like TFBS and DHS as predictive factors
Abstract
Background: Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is important, but current approaches tackle average methylation within a genomic locus and are often limited to specific genomic regions. Results: We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict CpG site methylation levels using as features neighboring CpG site methylation levels and genomic distance, and co-localization with coding regions, CGIs, and regulatory elements from the ENCODE project, among others. Our approach achieves 91% -- 94% prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
