A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data
C. Titus Brown, Adina Howe, Qingpeng Zhang, Alexis B. Pyrkosz, Timothy, H. Brom

TL;DR
This paper introduces digital normalization, a computational method that reduces data size and error in shotgun sequencing datasets, improving assembly efficiency without losing significant information.
Contribution
The paper presents a novel single-pass algorithm for digital normalization that streamlines shotgun sequencing data processing and assembly.
Findings
Reduces dataset size significantly
Decreases memory and time for assembly
Maintains content integrity of assembled contigs
Abstract
Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the development of new classes of mapping tools and {\em de novo} assemblers. These algorithms are challenged by the continued improvement in sequencing throughput. We here describe digital normalization, a single-pass computational algorithm that systematizes coverage in shotgun sequencing data sets, thereby decreasing sampling variation, discarding redundant data, and removing the majority of errors. Digital normalization substantially reduces the size of shotgun data sets and decreases the memory and time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Microbial Community Ecology and Physiology · Protist diversity and phylogeny
