
TL;DR
This paper introduces 'sign-full' random projections that improve cosine similarity estimation over traditional 1-bit methods, especially at high similarity levels, by using expectation-based estimators and normalization techniques.
Contribution
It develops novel estimators for cosine similarity from full projection data, significantly enhancing accuracy over sign-only methods and providing practical normalization strategies.
Findings
Estimated cosine similarity has lower variance with sign-full projections.
Normalized estimators outperform sign-sign projections at high similarity.
At high similarity, variance is reduced to about 40% of sign-sign estimators.
Abstract
The method of 1-bit ("sign-sign") random projections has been a popular tool for efficient search and machine learning on large datasets. Given two -dim data vectors , , one can generate , and , where iid. The "collision probability" is , where is the cosine similarity. We develop "sign-full" random projections by estimating from (e.g.,) the expectation , which can be further substantially improved by normalizing . For nonnegative data, we recommend an interesting estimator based on and its normalized version. The recommended estimator almost matches the accuracy of the (computationally expensive) maximum likelihood…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Random Matrices and Applications
