Fast construction of FM-index for long sequence reads
Heng Li

TL;DR
This paper introduces a novel, fast method for incrementally constructing FM-indexes for both short and long sequence reads, enabling efficient indexing of entire genomes without separate sorting steps.
Contribution
It presents the first algorithm capable of building FM-indexes while implicitly sorting sequences in reverse lexicographical order, optimized for long reads.
Findings
Fast indexing of short reads
Practical indexing of long reads of kilobases
No separate sorting step needed
Abstract
Summary: We present a new method to incrementally construct the FM-index for both short and long sequence reads, up to the size of a genome. It is the first algorithm that can build the index while implicitly sorting the sequences in the reverse (complement) lexicographical order without a separate sorting step. The implementation is among the fastest for indexing short reads and the only one that practically works for reads of averaged kilobases in length. Availability and implementation: https://github.com/lh3/ropebwt2 Contact: [email protected]
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
