Coverage statistics for sequence census methods
Steven N. Evans, Valerie Hower, Lior Pachter

TL;DR
This paper extends the classic Lander-Waterman model to include fragment length distributions, modeling coverage as a Poisson process, and introduces methods to detect coverage aberrations in high-throughput sequencing data.
Contribution
It provides a new theoretical framework for modeling sequencing coverage considering fragment length variability and introduces a novel approach for visualizing sequencing data.
Findings
Coverage modeled as a 2D Poisson process regardless of fragment length distribution.
Coverage function's jump skeleton forms Galton-Watson trees with computable parameters.
Provides a null model for detecting deviations in high-throughput sequencing coverage.
Abstract
Background: We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce the notion of the shape of a coverage function, which can be used to detect abberations in coverage. The probability theory underlying these problems is essential for constructing models of current high-throughput sequencing experiments, where both sample preparation protocols and sequencing technology particulars can affect fragment length distributions. Results: We show that regardless of fragment length distribution and under the mild assumption that fragment start sites are Poisson distributed, the fragments produced in a sequencing experiment can be viewed as resulting from a two-dimensional spatial Poisson process. We then study the jump…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Genomics and Phylogenetic Studies · Genomics and Chromatin Dynamics
