LZ-Compressed String Dictionaries
Julian Arz, Johannes Fischer

TL;DR
This paper presents a method for compressing string dictionaries using LZ78, achieving high compression ratios especially with repetitive data, while maintaining competitive query performance.
Contribution
It introduces a novel approach applying LZ78 compression to string dictionaries, outperforming existing methods in compression ratio on large, repetitive datasets.
Findings
Achieves superior compression ratios on large, repetitive dictionaries
Maintains competitive query times
Validated on datasets up to 1.5 GB
Abstract
We show how to compress string dictionaries using the Lempel-Ziv (LZ78) data compression algorithm. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text. We achieve compression ratios often outperforming the existing alternatives, especially on dictionaries containing many repeated substrings. Our query times remain competitive.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Natural Language Processing Techniques
