Time and Memory Efficient Lempel-Ziv Compression Using Suffix Arrays
Artur Ferreira, Arlindo Oliveira, Mario Figueiredo

TL;DR
This paper introduces faster, low-memory algorithms for LZ77 compression using suffix arrays, outperforming tree-based methods in speed and memory efficiency on large datasets, with applications in text classification.
Contribution
It presents improved suffix array-based algorithms for LZ77 encoding that are faster and use less memory than previous methods, suitable for large-scale data compression.
Findings
SA-based encoders are faster than tree-based encoders on benchmark files.
The algorithms require less memory, making them suitable for large datasets.
They enable efficient text classification and indexing tasks.
Abstract
The well-known dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are the basis of several universal lossless compression techniques. These algorithms are asymmetric regarding encoding/decoding time and memory requirements, with the former being much more demanding. In the past years, considerable attention has been devoted to the problem of finding efficient data structures to support these searches, aiming at optimizing the encoders in terms of speed and memory. Hash tables, binary search trees and suffix trees have been widely used for this purpose, as they allow fast search at the expense of memory. Some recent research has focused on suffix arrays (SA), due to their low memory requirements and linear construction algorithms. Previous work has shown how the LZ77 decomposition can be computed using a single SA or an SA with an auxiliary array with the longest common prefix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Compression Techniques · Network Packet Processing and Optimization
