Fast construction of FM-index for long sequence reads

Heng Li

arXiv:1406.0426·q-bio.GN·August 29, 2014

Fast construction of FM-index for long sequence reads

Heng Li

PDF

TL;DR

This paper introduces a novel, fast method for incrementally constructing FM-indexes for both short and long sequence reads, enabling efficient indexing of entire genomes without separate sorting steps.

Contribution

It presents the first algorithm capable of building FM-indexes while implicitly sorting sequences in reverse lexicographical order, optimized for long reads.

Findings

01

Fast indexing of short reads

02

Practical indexing of long reads of kilobases

03

No separate sorting step needed

Abstract

Summary: We present a new method to incrementally construct the FM-index for both short and long sequence reads, up to the size of a genome. It is the first algorithm that can build the index while implicitly sorting the sequences in the reverse (complement) lexicographical order without a separate sorting step. The implementation is among the fastest for indexing short reads and the only one that practically works for reads of averaged kilobases in length. Availability and implementation: https://github.com/lh3/ropebwt2 Contact: [email protected]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.