Fast Set Intersection in Memory
Bolin Ding (UIUC), Arnd Christian K\"onig (Microsoft Research)

TL;DR
This paper presents new memory-efficient data structures for fast set intersection, achieving improved worst-case expected time complexity and practical performance over existing methods in information retrieval and database systems.
Contribution
It introduces linear space data structures for set intersection with worst-case efficient algorithms and practical variants outperforming current techniques.
Findings
Expected intersection time is O(n/√w + kr).
Proposed algorithms outperform state-of-the-art methods in experiments.
Simple algorithm variant performs well in practice despite weaker guarantees.
Abstract
Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worst-case efficient way. In general, given k (preprocessed) sets, with totally n elements, we will show how to compute their intersection in expected time O(n/sqrt(w)+kr), where r is the intersection size and w is the number of bits in a machine-word. In addition,we introduce a very simple version of this algorithm that has weaker asymptotic guarantees but performs even better in practice; both algorithms outperform the state of the art techniques in terms of execution time for both synthetic and real data sets and workloads.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Algorithms and Data Compression
