HyDA-Vista: Towards Optimal Guided Selection of k-mer Size for Sequence Assembly
Seyed Basir Shariat Razavi, Narjes Sadat Movahedi Tabrizi, Hamidreza, Chitsaz, Christina Boucher

TL;DR
HyDA-Vista introduces a method to predict optimal k-mer sizes for genome assembly by leveraging homology information and a novel data structure, improving assembly quality especially for bacterial genomes.
Contribution
The paper presents HyDA-Vista, a genome assembler that uses homology-based predictions of k-mer sizes and introduces the maximal sequence landscape data structure for efficient computation.
Findings
Achieves the best assembly of E. coli among tested assemblers.
Constructs the maximal sequence landscape in O(n + n log n) time.
Effectively uses homology to select k-mer sizes for improved assembly.
Abstract
Motivation: Intimately tied to assembly quality is the complexity of the de Bruijn graph built by the assembler. Thus, there have been many paradigms developed to decrease the complexity of the de Bruijn graph. One obvious combinatorial paradigm for this is to allow the value of to vary; having a larger value of where the graph is more complex and a smaller value of where the graph would likely contain fewer spurious edges and vertices. One open problem that affects the practicality of this method is how to predict the value of prior to building the de Bruijn graph. We show that optimal values of can be predicted prior to assembly by using the information contained in a phylogenetically-close genome and therefore, help make the use of multiple values of practical for genome assembly. Results: We present HyDA-Vista, which is a genome assembler that uses homology…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · RNA and protein synthesis mechanisms · Biochemical and Structural Characterization
