Snowball: Strain aware gene assembly of Metagenomes
I. Gregor, A. Sch\"onhuth, A. C. McHardy

TL;DR
Snowball is a novel reference-free, strain-aware gene assembler for metagenomic data that uses profile HMMs to improve accuracy and distinguish strain variants without requiring reference genomes.
Contribution
It introduces a new strain-aware, reference-free gene assembly method that employs profile HMMs and read quality scores for improved accuracy in metagenomics.
Findings
Per-base error rates are very low due to error correction.
Runs efficiently on standard laptops with parallel processing.
Successfully distinguishes strain variants without reference genomes.
Abstract
Gene assembly is an important step in functional analysis of shotgun metagenomic data. Nonetheless, strain aware assembly remains a challenging task, as current assembly tools often fail to distinguish among strain variants or require closely related reference genomes of the studied species to be available. We have developed Snowball, a novel strain aware and reference-free gene assembler for shotgun metagenomic data. It uses profile hidden Markov models (HMMs) of gene domains of interest to guide the assembly. Our assembler performs gene assembly of individual gene domains based on read overlaps and error correction using read quality scores at the same time, which result in very low per-base error rates. The software runs on a user-defined number of processor cores in parallel, runs on a standard laptop and is freely available for installation under Linux or OS X on:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Microbial Community Ecology and Physiology · Bioinformatics and Genomic Networks
