Set Containment Join Revisited
Panagiotis Bouros, Nikos Mamoulis, Shen Ge, Manolis Terrovitis

TL;DR
This paper introduces an improved framework for set containment joins that reduces computational cost and memory usage by adaptive prefix tree construction and partitioning, outperforming the state-of-the-art PRETTI algorithm.
Contribution
It presents a novel adaptive and partitioned approach to enhance set containment join efficiency and memory management over existing methods.
Findings
Significant performance improvements over PRETTI.
Reduced memory requirements during join processing.
Effective handling of real and synthetic datasets.
Abstract
Given two collections of set objects and , the set containment join returns all object pairs such that . Besides being a basic operator in all modern data management systems with a wide range of applications, the join can be used to evaluate complex SQL queries based on relational division and as a module of data mining algorithms. The state-of-the-art algorithm for set containment joins (PRETTI) builds an inverted index on the right-hand collection and a prefix tree on the left-hand collection that groups set objects with common prefixes and thus, avoids redundant processing. In this paper, we present a framework which improves PRETTI in two directions. First, we limit the prefix tree construction by proposing an adaptive methodology based on a cost model; this way, we can greatly reduce the space and time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
