On the Use of Suffix Arrays for Memory-Efficient Lempel-Ziv Data Compression
Artur Ferreira, Arlindo Oliveira, Mario Figueiredo

TL;DR
This paper introduces two new suffix array-based algorithms for Lempel-Ziv compression that significantly reduce memory usage while maintaining efficiency, making them suitable for memory-constrained environments like embedded systems.
Contribution
The paper presents novel suffix array algorithms for LZ encoding that require no decoder modifications and have predictable, low memory requirements, outperforming suffix tree-based methods in memory efficiency.
Findings
Use 3 to 5 times less memory than suffix tree algorithms
Memory usage is independent of text size, allowing pre-allocation
Algorithms are applicable to text retrieval and substring search
Abstract
Much research has been devoted to optimizing algorithms of the Lempel-Ziv (LZ) 77 family, both in terms of speed and memory requirements. Binary search trees and suffix trees (ST) are data structures that have been often used for this purpose, as they allow fast searches at the expense of memory usage. In recent years, there has been interest on suffix arrays (SA), due to their simplicity and low memory requirements. One key issue is that an SA can solve the sub-string problem almost as efficiently as an ST, using less memory. This paper proposes two new SA-based algorithms for LZ encoding, which require no modifications on the decoder side. Experimental results on standard benchmarks show that our algorithms, though not faster, use 3 to 5 times less memory than the ST counterparts. Another important feature of our SA-based algorithms is that the amount of memory is independent of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
