Exploring Genome Characteristics and Sequence Quality Without a   Reference

Jared T. Simpson

arXiv:1307.8026·q-bio.GN·July 31, 2013·Bioinform.·2 cites

Exploring Genome Characteristics and Sequence Quality Without a Reference

Jared T. Simpson

PDF

Open Access

TL;DR

This paper introduces a new software tool for assessing the quality and characteristics of genome sequencing data without needing a reference genome, aiding de novo assembly processes.

Contribution

It presents novel methods for quality assessment and genome characteristic estimation directly from sequencing reads, improving de novo genome assembly workflows.

Findings

01

Calculates error rates and coverage metrics without a reference

02

Estimates genome repeat content and heterozygosity

03

Provides open-source software for genome analysis

Abstract

The de novo assembly of large, complex genomes is a significant challenge with currently available DNA sequencing technology. While many de novo assembly software packages are available, comparatively little attention has been paid to assisting the user with the assembly. This paper addresses the practical aspects of de novo assembly by introducing new ways to perform quality assessment on a collection of DNA sequence reads. The software implementation calculates per-base error rates, paired-end fragment size histograms and coverage metrics in the absence of a reference genome. Additionally, the software will estimate characteristics of the sequenced genome, such as repeat content and heterozygosity, that are key determinants of assembly difficulty. The software described is freely available and open source under the GNU Public License.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · RNA and protein synthesis mechanisms · Chromosomal and Genetic Variations