
TL;DR
This paper derives exact and approximate statistical distributions for pyrosequencing, aiding the development of better instruments and software for this next-generation sequencing technology.
Contribution
It provides the first explicit formulas for the distributions, mean, and variance of pyrosequencing outcomes, including normal approximations.
Findings
Exact distributions for fixed flow cycles and sequence length.
Explicit formulas for mean and variance.
Normal distribution approximations are accurate.
Abstract
Pyrosequencing is emerging as one of the important next-generation sequencing technologies. We derive the statistical distributions of this technique in terms of nucleotide probabilities of the target sequences. We give exact distributions both for fixed number of flow cycles and for fixed sequence length. Explicit formulas are derived for the mean and variance of these distributions. In both cases, the distributions can be approximated accurately by normal distributions with the same mean and variance. The statistical distributions will be useful for instrument and software development for pyrosequencing platforms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
