Fast and sensitive read mapping with approximate seeds and multiple backtracking
Enrico Siragusa, David Weese, Knut Reinert

TL;DR
Masai is a highly efficient read mapper that leverages approximate seeds and multiple backtracking to achieve superior speed and sensitivity in genomic data alignment, outperforming existing tools.
Contribution
The paper introduces novel filtration with approximate seeds and a multiple backtracking method, significantly enhancing read mapping speed and accuracy.
Findings
Masai is an order of magnitude faster than RazerS 3 and mrFAST.
It is 2-3 times faster and more accurate than Bowtie 2 and BWA.
The methods improve filtration specificity and speed up approximate searches.
Abstract
We present Masai, a read mapper representing the state of the art in terms of speed and sensitivity. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2--3 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared to exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic datasets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Advanced Image and Video Retrieval Techniques
