Predicting the methylation status of CpG islands from read distribution biases
Eldar T. Abdullaev, Dinesh A. Haridoss, Peter F. Arndt

TL;DR
This paper introduces a method to predict DNA methylation at CpG islands using ordinary short-read sequencing data by analyzing fragmentation biases.
Contribution
The novel contribution is a machine learning tool, WGS2meth, that infers methylation status from read distribution biases without bisulfite or long-read sequencing.
Findings
Methylated CpG sites are 30% more susceptible to fragmentation than unmethylated CpG sites.
The proposed machine learning model accurately predicts methylation status of CpG islands from ordinary sequencing reads.
The method is implemented as a tool called WGS2meth for individual or aggregated sample analysis.
Abstract
DNA methylation is an important epigenetic mark that plays a major role in transcriptional regulation, development and genome integrity. There are state-of-the-art methods, such as whole-genome bisulfite sequencing or long-read sequencing, which allow accurate detection of DNA methylation at single-base resolution. However, except for these specialized methods, information about DNA methylation status cannot be obtained directly from ordinary short-read sequencing data. Here we propose an approach to predict the methylation status from mapped read coordinates alone. It relies on previous findings that the DNA fragmentation process during library preparation is not random, but is affected by sequence context. In particular, DNA shearing leads to preferential hydrolysis of the sugar-phosphate backbone at CpG dinucleotides. Notably, methylated CpGs are approximately 30% more susceptible to…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEpigenetics and DNA Methylation · Genomics and Chromatin Dynamics · Machine Learning in Bioinformatics
