Fast evaluation of union-intersection expressions
Philip Bille, Anna Pagh, Rasmus Pagh

TL;DR
This paper introduces a new data structure for efficiently computing unions and intersections of sets, significantly improving worst-case performance in the RAM and I/O models, with near-optimal bounds.
Contribution
It presents a novel approach combining approximate set representations and word-level parallelism for fast set expression evaluation, achieving near-optimal bounds.
Findings
Expected intersection time is $O(n (\log w)^2 / w + km)$ on RAM.
The method is $w^{1-o(1)}$ times faster than standard merging.
Lower bounds show near-optimality of the approach for small $m$.
Abstract
We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size , a special case of our result is that the intersection of (preprocessed) sets, containing elements in total, can be computed in expected time , where is the number of elements in the intersection. If the first of the two terms dominates, this is a factor faster than the standard solution of merging sorted lists. We show a cell probe lower bound of time , meaning that our upper bound is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · semigroups and automata theory
