Efficient Inner Product Approximation in Hybrid Spaces
Xiang Wu, Ruiqi Guo, David Simcha, Dave Dopson, Sanjiv Kumar

TL;DR
This paper introduces a novel method for fast and accurate inner product approximation in hybrid spaces combining sparse and dense data, significantly improving search speed and accuracy in large-scale datasets.
Contribution
It presents a new technique and data structures for efficient inner product approximation in hybrid spaces, addressing a gap in existing methods for high-dimensional, heterogeneous data.
Findings
Achieves over 10x speedup in large-scale datasets
Maintains high accuracy in hybrid space search
Demonstrates effectiveness on billion-dimensional data
Abstract
Many emerging use cases of data mining and machine learning operate on large datasets with data from heterogeneous sources, specifically with both sparse and dense components. For example, dense deep neural network embedding vectors are often used in conjunction with sparse textual features to provide high dimensional hybrid representation of documents. Efficient search in such hybrid spaces is very challenging as the techniques that perform well for sparse vectors have little overlap with those that work well for dense vectors. Popular techniques like Locality Sensitive Hashing (LSH) and its data-dependent variants also do not give good accuracy in high dimensional hybrid spaces. Even though hybrid scenarios are becoming more prevalent, currently there exist no efficient techniques in literature that are both fast and accurate. In this paper, we propose a technique that approximates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Advanced Neural Network Applications
