Variant tolerant read mapping using min-hashing
Jens Quedenfeld, Sven Rahmann

TL;DR
VATRAM is a novel read mapper that incorporates genetic variants directly into its index using Min-Hashing, enabling improved accuracy in mapping long, error-prone reads compared to existing tools.
Contribution
The paper introduces VATRAM, a variant-aware read mapper utilizing Min-Hashing for efficient and accurate mapping of long, error-prone reads with known genetic variants.
Findings
VATRAM outperforms BWA in precision and recall under certain conditions.
Incorporating variants directly into the index improves mapping accuracy.
VATRAM is open source and accessible for further research.
Abstract
DNA read mapping is a ubiquitous task in bioinformatics, and many tools have been developed to solve the read mapping problem. However, there are two trends that are changing the landscape of readmapping: First, new sequencing technologies provide very long reads with high error rates (up to 15%). Second, many genetic variants in the population are known, so the reference genome is not considered as a single string over ACGT, but as a complex object containing these variants. Most existing read mappers do not handle these new circumstances appropriately. We introduce a new read mapper prototype called VATRAM that considers variants. It is based on Min-Hashing of q-gram sets of reference genome windows. Min-Hashing is one form of locality sensitive hashing. The variants are directly inserted into VATRAMs index which leads to a fast mapping process. Our results show that VATRAM achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · Advanced Image and Video Retrieval Techniques
