On the Complexity of Inner Product Similarity Join
Thomas D. Ahle, Rasmus Pagh, Ilya Razenshteyn, Francesco, Silvestri

TL;DR
This paper explores the computational complexity of inner product similarity join, establishing new bounds and proposing a novel indexing method, thereby advancing understanding of its theoretical limits and practical algorithms.
Contribution
It provides the first systematic study of IPS join, including new lower bounds, upper bounds, and a linear sketch-based indexing method, clarifying the role of asymmetry and hardness assumptions.
Findings
Approximation hardness of IPS join under ETH
New bounds for ALSH-based algorithms
A linear sketch-based indexing method
Abstract
A number of tasks in classification, information retrieval, recommendation systems, and record linkage reduce to the core problem of inner product similarity join (IPS join): identifying pairs of vectors in a collection that have a sufficiently large inner product. IPS join is well understood when vectors are normalized and some approximation of inner products is allowed. However, the general case where vectors may have any length appears much more challenging. Recently, new upper bounds based on asymmetric locality-sensitive hashing (ALSH) and asymmetric embeddings have emerged, but little has been known on the lower bound side. In this paper we initiate a systematic study of inner product similarity join, showing new lower and upper bounds. Our main results are: * Approximation hardness of IPS join in subquadratic time, assuming the strong exponential time hypothesis. * New upper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
