An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors
Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo, Liberty

TL;DR
This paper introduces Sinnamon, an approximate algorithm designed for efficient maximum inner product search over streaming, sparse, real-valued vectors, addressing limitations of existing methods in dynamic and arbitrary distribution settings.
Contribution
The paper presents Sinnamon, a novel approximate algorithm that handles streaming sparse vectors with adjustable trade-offs, outperforming existing algorithms in dynamic, arbitrary distribution environments.
Findings
Sinnamon effectively balances memory, latency, and accuracy.
Theoretical bounds on approximation error are established.
Empirical results show Sinnamon's superior performance on synthetic and real datasets.
Abstract
Maximum Inner Product Search or top-k retrieval on sparse vectors is well-understood in information retrieval, with a number of mature algorithms that solve it exactly. However, all existing algorithms are tailored to text and frequency-based similarity measures. To achieve optimal memory footprint and query latency, they rely on the near stationarity of documents and on laws governing natural languages. We consider, instead, a setup in which collections are streaming -- necessitating dynamic indexing -- and where indexing and retrieval must work with arbitrarily distributed real-valued vectors. As we show, existing algorithms are no longer competitive in this setup, even against naive solutions. We investigate this gap and present a novel approximate solution, called Sinnamon, that can efficiently retrieve the top-k results for sparse real valued vectors drawn from arbitrary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Advanced Image and Video Retrieval Techniques · Machine Learning and Algorithms
