BAUM: A DNA Assembler by Adaptive Unique Mapping and Local Overlap-Layout-Consensus
Anqi Wang, Zheng Li, Zhanyu Wang, Lei M. Li

TL;DR
BAUM is a novel DNA assembler that uses adaptive unique mapping and local overlap-layout-consensus strategies to improve genome assembly, especially in repetitive regions, demonstrated on wild rice with superior contig N50.
Contribution
It introduces a new assembler BAUM that combines adaptive read mapping, statistical structural variation detection, and local OLC assembly, differing from traditional de Bruijn graph methods.
Findings
Achieved contig N50 of 18.8k in wild rice genome assembly.
Improved assembly quality compared to existing assemblers.
Validated assembly with independent long-read data.
Abstract
Genome assembly from the high-throughput sequencing (HTS) reads is a fundamental yet challenging computational problem. An intrinsic challenge is the uncertainty caused by the widespread repetitive elements. Here we get around the uncertainty using the notion of uniquely mapped (UM) reads, which motivated the design of a new assembler BAUM. It mainly consists of two types of iterations. The first type of iterations constructs initial contigs from a reference, say a genome of a species that could be quite distant, by adaptive read mapping, filtration by the reference's unique regions, and reference updating. A statistical test is proposed to split the layouts at possible structural variation sites. The second type of iterations includes mapping, scaffolding/contig-extension, and contig merging. We extend each contig by locally assembling the reads whose mates are uniquely mapped to an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · RNA and protein synthesis mechanisms · Chromosomal and Genetic Variations
