Compressing Suffix Trees by Path Decompositions

Ruben Becker; Davide Cenzato; Travis Gagie; Sung-Hwan Kim; Ragnar Groot Koerkamp; Giovanni Manzini; Nicola Prezza

arXiv:2506.14734·cs.DS·May 7, 2026

Compressing Suffix Trees by Path Decompositions

Ruben Becker, Davide Cenzato, Travis Gagie, Sung-Hwan Kim, Ragnar Groot Koerkamp, Giovanni Manzini, Nicola Prezza

PDF

TL;DR

This paper introduces a new suffix array sampling method based on path decompositions of suffix trees, enabling efficient pattern matching in the I/O model with improved space bounds.

Contribution

It presents a novel suffix array sampling technique using path decompositions that improves space efficiency and I/O performance over previous methods.

Findings

01

Bound the number of paths by r, the number of BWT runs.

02

Achieve efficient pattern matching in the I/O model.

03

Improve space bounds from 2r to r in suffix array sampling.

Abstract

The suffix tree is arguably the most fundamental data structure on strings: introduced by Weiner (SWAT 1973) and McCreight (JACM 1976), it allows solving a myriad of computational problems on strings in linear time. Motivated by its large space usage, subsequent research focused first on reducing its size by a constant factor via Suffix Arrays, and later on reaching space proportional to the size of the compressed string. Modern compressed indexes, such as the $r$ -index (Gagie et al., SODA 2018), fit in space proportional to $r$ , the number of runs in the Burrows-Wheeler transform (a strong and universal repetitiveness measure). These advances, however, came with a price: while modern compressed indexes boast optimal bounds in the RAM model, they are often orders of magnitude slower than uncompressed counterparts in practice due to catastrophic cache locality. This reality gap…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.