Asymmetric Minwise Hashing
Anshumali Shrivastava, Ping Li

TL;DR
This paper introduces asymmetric minwise hashing (MH-ALSH), a novel scheme that corrects bias in traditional minhash to improve set overlap and inner product estimation, with theoretical and empirical validation showing superior performance.
Contribution
The paper proposes MH-ALSH, an asymmetric transformation-based hashing method that enhances minhash for better inner product and set containment retrieval tasks.
Findings
MH-ALSH outperforms traditional minhash in theoretical comparisons.
Experimental results show significant improvements on high-dimensional datasets.
The scheme is simple, effective, and easy to implement.
Abstract
Minwise hashing (Minhash) is a widely popular indexing scheme in practice. Minhash is designed for estimating set resemblance and is known to be suboptimal in many applications where the desired measure is set overlap (i.e., inner product between binary vectors) or set containment. Minhash has inherent bias towards smaller sets, which adversely affects its performance in applications where such a penalization is not desirable. In this paper, we propose asymmetric minwise hashing (MH-ALSH), to provide a solution to this problem. The new scheme utilizes asymmetric transformations to cancel the bias of traditional minhash towards smaller sets, making the final "collision probability" monotonic in the inner product. Our theoretical comparisons show that for the task of retrieving with binary inner products asymmetric minhash is provably better than traditional minhash and other recently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Algorithms and Data Compression · DNA and Biological Computing
