Cerulean: A hybrid assembly using high throughput short and long reads
Viraj Deshpande, Eric DK Fung, Son Pham, and Vineet Bafna

TL;DR
Cerulean introduces a hybrid genome assembly method combining short and long reads that is computationally efficient, producing high-quality assemblies without extensive error correction, suitable for standard desktop use.
Contribution
It presents a novel hybrid assembly algorithm that avoids long read error correction, improving efficiency and quality over existing methods.
Findings
Achieves comparable assembly quality with less computational resources.
Operates efficiently on standard desktop hardware.
Produces high-quality assemblies for bacterial genomes.
Abstract
Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Microbial Community Ecology and Physiology · Legume Nitrogen Fixing Symbiosis
