TL;DR
This paper introduces CRoaring, an optimized C library for Roaring bitmaps that leverages SIMD instructions for fast set operations, with comprehensive benchmarking against alternatives.
Contribution
It provides an efficient, SIMD-accelerated implementation of Roaring bitmaps and evaluates its performance relative to existing solutions.
Findings
CRoaring outperforms many competitors in speed.
Vectorized algorithms significantly improve set operation performance.
The library is open-source and ready for practical use.
Abstract
Compressed bitmap indexes are used in systems such as Git or Oracle to accelerate queries. They represent sets and often support operations such as unions, intersections, differences, and symmetric differences. Several important systems such as Elasticsearch, Apache Spark, Netflix's Atlas, LinkedIn's Pinot, Metamarkets' Druid, Pilosa, Apache Hive, Apache Tez, Microsoft Visual Studio Team Services and Apache Kylin rely on a specific type of compressed bitmap index called Roaring. We present an optimized software library written in C implementing Roaring bitmaps: CRoaring. It benefits from several algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. In particular, we present vectorized algorithms to compute the intersection, union, difference and symmetric difference between arrays. We benchmark the library against a wide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
