LZ-Compressed String Dictionaries

Julian Arz; Johannes Fischer

arXiv:1305.0674·cs.DS·May 6, 2013·2 cites

LZ-Compressed String Dictionaries

Julian Arz, Johannes Fischer

PDF

Open Access

TL;DR

This paper presents a method for compressing string dictionaries using LZ78, achieving high compression ratios especially with repetitive data, while maintaining competitive query performance.

Contribution

It introduces a novel approach applying LZ78 compression to string dictionaries, outperforming existing methods in compression ratio on large, repetitive datasets.

Findings

01

Achieves superior compression ratios on large, repetitive dictionaries

02

Maintains competitive query times

03

Validated on datasets up to 1.5 GB

Abstract

We show how to compress string dictionaries using the Lempel-Ziv (LZ78) data compression algorithm. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text. We achieve compression ratios often outperforming the existing alternatives, especially on dictionaries containing many repeated substrings. Our query times remain competitive.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · Natural Language Processing Techniques