Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes
Owen Kaser, Daniel Lemire, Kamel Aouiche

TL;DR
This paper introduces histogram-aware sorting techniques that optimize bitmap index compression and query performance by reordering data based on attribute-value histograms, achieving significant size reduction and speed improvements.
Contribution
It proposes novel reordering heuristics using attribute histograms to enhance bitmap index compression and query efficiency beyond traditional sorting methods.
Findings
Reordering based on histograms reduces index size by up to 9 times.
Histogram-based column permutation improves sorting efficiency by 40%.
Enhanced sorting leads to faster logical operations on bitmap indexes.
Abstract
Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate reordering heuristics based on computed attribute-value histograms. Simply permuting the columns of the table based on these histograms can increase the sorting efficiency by 40%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Database Systems and Queries · Web Data Mining and Analysis
