On the Complexity of Inner Product Similarity Join

Thomas D. Ahle; Rasmus Pagh; Ilya Razenshteyn; Francesco; Silvestri

arXiv:1510.02824·cs.DS·April 8, 2016

On the Complexity of Inner Product Similarity Join

Thomas D. Ahle, Rasmus Pagh, Ilya Razenshteyn, Francesco, Silvestri

PDF

TL;DR

This paper explores the computational complexity of inner product similarity join, establishing new bounds and proposing a novel indexing method, thereby advancing understanding of its theoretical limits and practical algorithms.

Contribution

It provides the first systematic study of IPS join, including new lower bounds, upper bounds, and a linear sketch-based indexing method, clarifying the role of asymmetry and hardness assumptions.

Findings

01

Approximation hardness of IPS join under ETH

02

New bounds for ALSH-based algorithms

03

A linear sketch-based indexing method

Abstract

A number of tasks in classification, information retrieval, recommendation systems, and record linkage reduce to the core problem of inner product similarity join (IPS join): identifying pairs of vectors in a collection that have a sufficiently large inner product. IPS join is well understood when vectors are normalized and some approximation of inner products is allowed. However, the general case where vectors may have any length appears much more challenging. Recently, new upper bounds based on asymmetric locality-sensitive hashing (ALSH) and asymmetric embeddings have emerged, but little has been known on the lower bound side. In this paper we initiate a systematic study of inner product similarity join, showing new lower and upper bounds. Our main results are: * Approximation hardness of IPS join in subquadratic time, assuming the strong exponential time hypothesis. * New upper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.