Dynamic Enumeration of Similarity Joins

Pankaj K. Agarwal; Xiao Hu; Stavros Sintos; Jun Yang

arXiv:2105.01818·cs.DS·May 6, 2021

Dynamic Enumeration of Similarity Joins

Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, Jun Yang

PDF

TL;DR

This paper develops dynamic data structures for efficiently enumerating similarity-join results in low-dimensional spaces, with exact solutions for certain metrics and approximate solutions for high dimensions, ensuring worst-case delay guarantees.

Contribution

It introduces new dynamic data structures for similarity joins that support efficient updates and enumeration with delay guarantees, including near-linear size structures for specific metrics and LSH-based methods for high dimensions.

Findings

01

Exact data structures for $ ext{l}_1$ and $ ext{l}_$ metrics with $ ext{polylog}(n)$ delay and update time.

02

Impossibility results for $ ext{l}_2$ metric in dimensions $d \u2265 4$.

03

Unified linear-size data structure for approximate similarity join in $ ext{l}_p$ metrics.

Abstract

This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of $n$ points $A, B$ in $R^{d}$ , a metric $ϕ (\cdot)$ , and a distance threshold $r > 0$ , report all pairs of points $(a, b) \in A \times B$ with $ϕ (a, b) \leq r$ . Our goal is to store $A, B$ into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from $A$ or $B$ . We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for $ℓ_{1}, ℓ_{\infty}$ metrics with $lo g^{O (1)} n$ update time and delay. We show that such a data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.