Improving Table Compression with Combinatorial Optimization
Adam L. Buchsbaum, Glenn S. Fowler, Raffaele Giancarlo

TL;DR
This paper introduces new algorithms for table compression that optimize partitioning and column ordering, achieving significant improvements over gzip and previous methods through theoretical insights and experimental validation.
Contribution
It presents the first online training algorithms for table compression applicable to individual files and an offline reordering method based on the asymmetric TSP, enhancing compression rates.
Findings
Online algorithms improve gzip compression by 35-55%.
Offline column reordering adds up to 20% further improvement.
A variation of the problem is proven MAX-SNP hard.
Abstract
We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [SODA'00], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates. Based on the theory, we devise the first on-line training algorithms for table compression, which can be applied to individual files, not just continuously operating sources; and also a new, off-line training algorithm, based on a link to the asymmetric traveling salesman problem, which improves on prior work by rearranging columns prior to partitioning. We demonstrate these results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · Parallel Computing and Optimization Techniques
