Bounding the Average Move Structure Query for Faster and Smaller RLBWT Permutations
Nathaniel K. Brown, Ben Langmead

TL;DR
This paper introduces a simplified interval truncation method for move structures in compressed text indexes, achieving optimal average query time, reduced space, and faster construction, especially beneficial for genomics data.
Contribution
It proposes a length capping splitting scheme that simplifies move structure construction, improves query and construction times, and reduces space in RLBWT permutations.
Findings
Length capping bounds average move structure query time to optimal.
The method reduces overall representation size by O(r log r) bits.
Experiments show faster construction, lower memory usage, and significant space reduction in genomic data.
Abstract
The move structure represents permutations with long contiguously permuted intervals in compressed space with optimal query time. They have become an important feature of compressed text indexes using space proportional to the number of Burrows-Wheeler Transform (BWT) runs, often applied in genomics. This is in thanks not only to theoretical improvements over past approaches, but great cache efficiency and average case query time in practice. This is true even without using the worst case guarantees provided by the interval splitting balancing of the original result. In this paper, we show that an even simpler type of splitting, length capping by truncating long intervals, bounds the average move structure query time to optimal whilst obtaining a superior construction time than the traditional approach. This also proves constant query time when amortized over a full traversal of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
