Lightweight Data Indexing and Compression in External Memory
Paolo Ferragina, Travis Gagie, Giovanni Manzini

TL;DR
This paper introduces lightweight algorithms for computing the Burrows-Wheeler Transform (BWT) and building compressed indexes in external memory, significantly reducing disk space usage and optimizing disk access patterns for efficiency.
Contribution
It presents novel lightweight algorithms that use only linear bits of disk space and rely solely on sequential scans, improving over previous methods that required much more disk space.
Findings
Algorithms use only n bits of disk space
Sequential scans enable faster disk access
New lower bounds on BWT computation complexity
Abstract
In this paper we describe algorithms for computing the BWT and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size , they use only bits of disk working space while all previous approaches use bits of disk working space. Moreover, our algorithms access disk data only via sequential scans, thus they take full advantage of modern disk features that make sequential disk accesses much faster than random accesses. We also present a scan-based algorithm for inverting the BWT that uses bits of working space, and a lightweight {\em internal-memory} algorithm for computing the BWT which is the fastest in the literature when the available working space is bits. Finally, we prove {\em lower} bounds on the complexity of computing and inverting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · Cellular Automata and Applications
