Statistical distributions of sequencing by synthesis with probabilistic nucleotide incorporation
Yong Kong

TL;DR
This paper derives statistical models for sequencing by synthesis that account for incomplete nucleotide incorporation, providing exact distributions, mean, and variance, which are useful for improving sequencing technology and analysis.
Contribution
It introduces generalized statistical distributions for sequencing by synthesis considering incomplete and context-dependent nucleotide incorporation, extending previous models.
Findings
Incomplete incorporation significantly affects distribution mean and variance.
Distributions can be approximated by normal distributions with same mean and variance.
Explicit formulas for mean and variance are provided.
Abstract
Sequencing by synthesis is used in many next-generation DNA sequencing technologies. Some of the technologies, especially those exploring the principle of single-molecule sequencing, allow incomplete nucleotide incorporation in each cycle. We derive statistical distributions for sequencing by synthesis by taking into account the possibility that nucleotide incorporation may not be complete in each flow cycle. The statistical distributions are expressed in terms of nucleotide probabilities of the target sequences and the nucleotide incorporation probabilities for each nucleotide. We give exact distributions both for fixed number of flow cycles and for fixed sequence length. Explicit formulas are derived for the mean and variance of these distributions. The results are generalizations of our previous work for pyrosequencing. Incomplete nucleotide incorporation leads to significant change…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
