Bitmap Filter: Speeding up Exact Set Similarity Joins with Bitwise Operations
Edans F. O. Sandes, George Teodoro, Alba C. M. A. Melo

TL;DR
This paper introduces the Bitmap Filter, a novel bitwise operation-based technique that accelerates exact set similarity joins on CPUs and GPUs, achieving significant speedups over existing algorithms.
Contribution
The paper presents the Bitmap Filter, a new filtering method using bitmaps and bitwise operations to efficiently prune set pairs in similarity joins, applicable to multiple algorithms and hardware.
Findings
Achieved up to 4.50x speedup on CPU implementations.
Realized up to 577x acceleration using GPU algorithms.
Improved performance in 90% of tested scenarios.
Abstract
The Exact Set Similarity Join problem aims to find all similar sets between two collections of sets, with respect to a threshold and a similarity function such as overlap, Jaccard, dice or cosine. The naive approach verifies all pairs of sets and it is often considered impractical due the high number of combinations. So, Exact Set Similarity Join algorithms are usually based on the Filter-Verification Framework, that applies a series of filters to reduce the number of verified pairs. This paper presents a new filtering technique called Bitmap Filter, which is able to accelerate state-of-the-art algorithms for the exact Set Similarity Join problem. The Bitmap Filter uses hash functions to create bitmaps of fixed b bits, representing characteristics of the sets. Then, it applies bitwise operations (such as xor and population count) on the bitmaps in order to infer a similarity upper bound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
