Set Containment Join Revisited

Panagiotis Bouros; Nikos Mamoulis; Shen Ge; Manolis Terrovitis

arXiv:1603.05422·cs.DB·March 18, 2016

Set Containment Join Revisited

Panagiotis Bouros, Nikos Mamoulis, Shen Ge, Manolis Terrovitis

PDF

TL;DR

This paper introduces an improved framework for set containment joins that reduces computational cost and memory usage by adaptive prefix tree construction and partitioning, outperforming the state-of-the-art PRETTI algorithm.

Contribution

It presents a novel adaptive and partitioned approach to enhance set containment join efficiency and memory management over existing methods.

Findings

01

Significant performance improvements over PRETTI.

02

Reduced memory requirements during join processing.

03

Effective handling of real and synthetic datasets.

Abstract

Given two collections of set objects $R$ and $S$ , the $R ⋈_{\subseteq} S$ set containment join returns all object pairs $(r, s) \in R \times S$ such that $r \subseteq s$ . Besides being a basic operator in all modern data management systems with a wide range of applications, the join can be used to evaluate complex SQL queries based on relational division and as a module of data mining algorithms. The state-of-the-art algorithm for set containment joins (PRETTI) builds an inverted index on the right-hand collection $S$ and a prefix tree on the left-hand collection $R$ that groups set objects with common prefixes and thus, avoids redundant processing. In this paper, we present a framework which improves PRETTI in two directions. First, we limit the prefix tree construction by proposing an adaptive methodology based on a cost model; this way, we can greatly reduce the space and time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.