2-Bit Random Projections, NonLinear Estimators, and Approximate Near Neighbor Search
Ping Li, Michael Mitzenmacher, Anshumali Shrivastava

TL;DR
This paper introduces a 2-bit random projection coding scheme with nonlinear estimators for efficient similarity estimation and near neighbor search, demonstrating that only a few bits are needed for high accuracy in large-scale data applications.
Contribution
The paper develops a simple 2-bit coding scheme and nonlinear estimators for similarity, improving efficiency in near neighbor search and hash table construction.
Findings
2-bit coding is effective for high similarity levels
Only 1-3 bits are needed for accurate similarity estimation
Experimental results confirm practical efficiency of the scheme
Abstract
The method of random projections has become a standard tool for machine learning, data mining, and search with massive data at Web scale. The effective use of random projections requires efficient coding schemes for quantizing (real-valued) projected data into integers. In this paper, we focus on a simple 2-bit coding scheme. In particular, we develop accurate nonlinear estimators of data similarity based on the 2-bit strategy. This work will have important practical applications. For example, in the task of near neighbor search, a crucial step (often called re-ranking) is to compute or estimate data similarities once a set of candidate data points have been identified by hash table techniques. This re-ranking step can take advantage of the proposed coding scheme and estimator. As a related task, in this paper, we also study a simple uniform quantization scheme for the purpose of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Algorithms and Data Compression · Image Retrieval and Classification Techniques
