Clustering is Efficient for Approximate Maximum Inner Product Search
Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, Yoshua, Bengio

TL;DR
This paper introduces a simple spherical k-means clustering approach for approximate maximum inner product search, outperforming existing hashing and tree-based methods in speed and robustness on benchmark datasets.
Contribution
The paper demonstrates that a spherical k-means clustering method, combined with cosine similarity reduction, is a highly effective and robust solution for approximate MIPS, surpassing current state-of-the-art techniques.
Findings
Achieves higher speedups at the same precision compared to existing methods.
Provides more robust retrievals under noisy query conditions.
Effective on recommendation system benchmarks and large vocabulary embeddings.
Abstract
Efficient Maximum Inner Product Search (MIPS) is an important task that has a wide applicability in recommendation systems and classification with a large number of classes. Solutions based on locality-sensitive hashing (LSH) as well as tree-based solutions have been investigated in the recent literature, to perform approximate MIPS in sublinear time. In this paper, we compare these to another extremely simple approach for solving approximate MIPS, based on variants of the k-means clustering algorithm. Specifically, we propose to train a spherical k-means, after having reduced the MIPS problem to a Maximum Cosine Similarity Search (MCSS). Experiments on two standard recommendation system benchmarks as well as on large vocabulary word embeddings, show that this simple approach yields much higher speedups, for the same retrieval precision, than current state-of-the-art hashing-based and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Algorithms and Data Compression · Image Retrieval and Classification Techniques
Methodsk-Means Clustering
