TL;DR
This paper introduces a haplotype-aware extension to graph indexes in the variation graph toolkit, improving the identification of likely true haplotypes and enabling scalable indexing of large genomic datasets.
Contribution
It develops a scalable haplotype-aware graph index and an algorithm for simplifying variation graphs for k-mer indexing without losing haplotype information.
Findings
Successfully indexed 1000 Genomes Project haplotypes
Enhanced accuracy in identifying true haplotypes
Maintained all haplotype k-mers during graph simplification
Abstract
The variation graph toolkit (VG) represents genetic variation as a graph. Each path in the graph is a potential haplotype, though most paths are unlikely recombinations of true haplotypes. We augment the VG model with haplotype information to identify which paths are more likely to be correct. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows--Wheeler transform. We demonstrate the scalability of the new implementation by indexing the 1000 Genomes Project haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
