Optimizations and Heuristics to improve Compression in Columnar Database Systems
Jayanth Jayanth

TL;DR
This paper introduces two novel optimizations for compression techniques in in-memory columnar databases, enhancing efficiency and proposing heuristics for selecting optimal encoding schemes to handle large-scale data more effectively.
Contribution
The paper presents two new compression optimizations—Block Size Optimized Cluster Encoding and Block Size Optimized Indirect Encoding—that outperform existing methods, along with heuristics for selecting the best compression scheme.
Findings
Optimized compression techniques improve data size reduction.
Heuristics effectively select the best encoding scheme.
Enhanced performance in large-scale data processing.
Abstract
In-memory columnar databases have become mainstream over the last decade and have vastly improved the fast processing of large volumes of data through multi-core parallelism and in-memory compression thereby eliminating the usual bottlenecks associated with disk-based databases. For scenarios, where the data volume grows into terabytes and petabytes, keeping all the data in memory is exorbitantly expensive. Hence, the data is compressed efficiently using different algorithms to exploit the multi-core parallelization technologies for query processing. Several compression methods are studied for compressing the column array, post Dictionary Encoding. In this paper, we will present two novel optimizations in compression techniques - Block Size Optimized Cluster Encoding and Block Size Optimized Indirect Encoding - which perform better than their predecessors. In the end, we also propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · Error Correcting Code Techniques
