On the Use of Suffix Arrays for Memory-Efficient Lempel-Ziv Data   Compression

Artur Ferreira; Arlindo Oliveira; Mario Figueiredo

arXiv:0903.4251·cs.DS·November 17, 2016

On the Use of Suffix Arrays for Memory-Efficient Lempel-Ziv Data Compression

Artur Ferreira, Arlindo Oliveira, Mario Figueiredo

PDF

TL;DR

This paper introduces two new suffix array-based algorithms for Lempel-Ziv compression that significantly reduce memory usage while maintaining efficiency, making them suitable for memory-constrained environments like embedded systems.

Contribution

The paper presents novel suffix array algorithms for LZ encoding that require no decoder modifications and have predictable, low memory requirements, outperforming suffix tree-based methods in memory efficiency.

Findings

01

Use 3 to 5 times less memory than suffix tree algorithms

02

Memory usage is independent of text size, allowing pre-allocation

03

Algorithms are applicable to text retrieval and substring search

Abstract

Much research has been devoted to optimizing algorithms of the Lempel-Ziv (LZ) 77 family, both in terms of speed and memory requirements. Binary search trees and suffix trees (ST) are data structures that have been often used for this purpose, as they allow fast searches at the expense of memory usage. In recent years, there has been interest on suffix arrays (SA), due to their simplicity and low memory requirements. One key issue is that an SA can solve the sub-string problem almost as efficiently as an ST, using less memory. This paper proposes two new SA-based algorithms for LZ encoding, which require no modifications on the decoder side. Experimental results on standard benchmarks show that our algorithms, though not faster, use 3 to 5 times less memory than the ST counterparts. Another important feature of our SA-based algorithms is that the amount of memory is independent of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.