Bridging Dense and Sparse Maximum Inner Product Search
Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo, Liberty

TL;DR
This paper explores applying dense vector algorithms to sparse vectors in maximum inner product search, demonstrating IVF-based retrieval as an efficient solution and proposing a unified approach for mixed dense-sparse vectors.
Contribution
It introduces IVF-based retrieval for sparse MIPS, analyzes dimensionality reduction techniques, and proposes a unified framework for vectors with dense and sparse subspaces.
Findings
IVF is effective for sparse MIPS
Dimensionality reduction improves sparse vector retrieval
Unified approach is robust across query distributions
Abstract
Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top- retrieval in Information Retrieval. This duality exists because sparse and dense vectors serve different end goals. That is despite the fact that they are manifestations of the same mathematical problem. In this work, we ask if algorithms for dense vectors could be applied effectively to sparse vectors, particularly those that violate the assumptions underlying top- retrieval methods. We study IVF-based retrieval where vectors are partitioned into clusters and only a fraction of clusters are searched during retrieval. We conduct a comprehensive analysis of dimensionality reduction for sparse vectors, and examine standard and spherical KMeans for partitioning. Our experiments demonstrate that IVF serves as an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Data Management and Algorithms · Image Retrieval and Classification Techniques
MethodsPruning
