Reordering Rows for Better Compression: Beyond the Lexicographic Order

Daniel Lemire; Owen Kaser; Eduardo Gutarra

arXiv:1207.2189·cs.DB·February 4, 2014

Reordering Rows for Better Compression: Beyond the Lexicographic Order

Daniel Lemire, Owen Kaser, Eduardo Gutarra

PDF

3 Repos

TL;DR

This paper introduces new heuristics for reordering database rows to enhance compression efficiency beyond traditional lexicographic sorting, achieving significant improvements in run-length encoding and prefix coding.

Contribution

It presents two novel heuristics, Multiple Lists and Vortex, that improve data compression by optimizing row orderings more effectively than existing methods.

Findings

01

Improved run-length encoding compression by up to 3 times.

02

Enhanced prefix coding efficiency by up to 80%.

03

New heuristics outperform traditional lexicographic sorting in specific cases.

Abstract

Sorting database tables before compressing them improves the compression rate. Can we do better than the lexicographical order? For minimizing the number of runs in a run-length encoding compression scheme, the best approaches to row-ordering are derived from traveling salesman heuristics, although there is a significant trade-off between running time and compression. A new heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades off compression for a major running-time speedup, is a good option for very large tables. However, for some compression schemes, it is more important to generate long runs rather than few runs. For this case, another novel heuristic, Vortex, is promising. We find that we can improve run-length encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%: these gains are on top of the gains due to lexicographically sorting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.