Dynamic Enumeration of Similarity Joins
Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, Jun Yang

TL;DR
This paper develops dynamic data structures for efficiently enumerating similarity-join results in low-dimensional spaces, with exact solutions for certain metrics and approximate solutions for high dimensions, ensuring worst-case delay guarantees.
Contribution
It introduces new dynamic data structures for similarity joins that support efficient updates and enumeration with delay guarantees, including near-linear size structures for specific metrics and LSH-based methods for high dimensions.
Findings
Exact data structures for $ ext{l}_1$ and $ ext{l}_$ metrics with $ ext{polylog}(n)$ delay and update time.
Impossibility results for $ ext{l}_2$ metric in dimensions $d \u2265 4$.
Unified linear-size data structure for approximate similarity join in $ ext{l}_p$ metrics.
Abstract
This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of points in , a metric , and a distance threshold , report all pairs of points with . Our goal is to store into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from or . We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for metrics with update time and delay. We show that such a data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
