Tri de la table de faits et compression des index bitmaps avec alignement sur les mots
Kamel Aouiche, Daniel Lemire, Owen Kaser

TL;DR
This paper investigates how sorting fact tables affects bitmap index compression and query performance, finding lexicographic sorting significantly improves compression and speed, especially when columns with many distinct values are prioritized.
Contribution
It introduces and evaluates different sorting strategies for fact tables to optimize bitmap index compression and query efficiency, highlighting the effectiveness of lexicographic sorting.
Findings
Lexicographic sorting can double index compression efficiency.
Sorting improves query speed by several times.
Column order impacts compression and performance.
Abstract
Bitmap indexes are frequently used to index multidimensional data. They rely mostly on sequential input/output. Bitmaps can be compressed to reduce input/output costs and minimize CPU usage. The most efficient compression techniques are based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. This type of compression accelerates logical operations (AND, OR) over the bitmaps. However, run-length encoding is sensitive to the order of the facts. Thus, we propose to sort the fact tables. We review lexicographic, Gray-code, and block-wise sorting. We found that a lexicographic sort improves compression--sometimes generating indexes twice as small--and make indexes several times faster. While sorting takes time, this is partially offset by the fact that it is faster to index a sorted table. Column order is significant: it is generally preferable to put the columns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Algorithms and Data Compression
