TL;DR
This paper introduces an improved Roaring bitmap compression technique that combines multiple encoding methods, resulting in faster processing and better compression, especially on sorted data, outperforming traditional RLE-based methods.
Contribution
A new hybrid Roaring bitmap format that integrates uncompressed, packed, and RLE segments, enhancing compression and speed over existing methods.
Findings
Up to 100x faster than traditional RLE-based bitmap indexes.
Significantly better compression on sorted data.
Effective in real-world database and search engine platforms.
Abstract
Compressed bitmap indexes are used in databases and search engines. Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). However, on unsorted data, we can get superior performance with a hybrid compression technique that uses both uncompressed bitmaps and packed arrays inside a two-level tree. An instance of this technique, Roaring, has recently been proposed. Due to its good performance, it has been adopted by several production platforms (e.g., Apache Lucene, Apache Spark, Apache Kylin and Druid). Yet there are cases where run-length encoded bitmaps are smaller than the original Roaring bitmaps---typically when the data is sorted so that the bitmaps contain long compressible runs. To better handle these cases, we build a new Roaring hybrid that combines uncompressed bitmaps, packed arrays and RLE compressed segments. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
