Counting unique molecular identifiers in sequencing using a multitype   branching process with immigration

Serik Sagitov; Anders St{\aa}hlberg

arXiv:2205.06405·q-bio.QM·June 7, 2022

Counting unique molecular identifiers in sequencing using a multitype branching process with immigration

Serik Sagitov, Anders St{\aa}hlberg

PDF

Open Access

TL;DR

This paper models the process of counting unique molecular identifiers (UMIs) in sequencing using a multitype branching process with immigration, revealing an asymptotic pattern that aids in interpreting sequencing data.

Contribution

It introduces a novel branching process model for PCR barcoding, accounting for multiple amplification rates and providing insights into UMI cluster distributions.

Findings

01

Asymptotic pattern: E(C_t(m))/E(C_t) ≈ 2^{-m} for large t

02

Model distinguishes five different amplification rates

03

Results help interpret sequencing outcomes more accurately

Abstract

Detection of extremely rare variant alleles, such as tumour DNA, within a complex mixture of DNA molecules is experimentally challenging due to sequencing errors. Barcoding of target DNA molecules in library construction for next-generation sequencing provides a way to identify and bioinformatically remove polymerase induced errors. During the barcoding procedure involving $t$ consecutive PCR cycles, the DNA molecules become barcoded by unique molecular identifiers (UMI). Different library construction protocols utilise different values of $t$ . The effect of a larger $t$ and imperfect PCR amplifications is poorly described. This paper proposes a branching process with growing immigration as a model describing the random outcome of $t$ cycles of PCR barcoding. Our model discriminates between five different amplification rates $r_{1}$ , $r_{2}$ , $r_{3}$ , $r_{4}$ , $r$ for different types of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCancer Genomics and Diagnostics · Gene expression and cancer classification · Molecular Biology Techniques and Applications