Counting unique molecular identifiers in sequencing using a multitype branching process with immigration
Serik Sagitov, Anders St{\aa}hlberg

TL;DR
This paper models the process of counting unique molecular identifiers (UMIs) in sequencing using a multitype branching process with immigration, revealing an asymptotic pattern that aids in interpreting sequencing data.
Contribution
It introduces a novel branching process model for PCR barcoding, accounting for multiple amplification rates and providing insights into UMI cluster distributions.
Findings
Asymptotic pattern: E(C_t(m))/E(C_t) ≈ 2^{-m} for large t
Model distinguishes five different amplification rates
Results help interpret sequencing outcomes more accurately
Abstract
Detection of extremely rare variant alleles, such as tumour DNA, within a complex mixture of DNA molecules is experimentally challenging due to sequencing errors. Barcoding of target DNA molecules in library construction for next-generation sequencing provides a way to identify and bioinformatically remove polymerase induced errors. During the barcoding procedure involving consecutive PCR cycles, the DNA molecules become barcoded by unique molecular identifiers (UMI). Different library construction protocols utilise different values of . The effect of a larger and imperfect PCR amplifications is poorly described. This paper proposes a branching process with growing immigration as a model describing the random outcome of cycles of PCR barcoding. Our model discriminates between five different amplification rates , , , , for different types of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Genomics and Diagnostics · Gene expression and cancer classification · Molecular Biology Techniques and Applications
