DartMinHash: Fast Sketching for Weighted Sets
Tobias Christiani

TL;DR
DartMinHash introduces a fast, efficient algorithm for weighted minwise hashing that outperforms previous methods in speed and scalability, especially for sparse data, and enhances approximate near neighbor search.
Contribution
The paper presents DartMinHash, a novel algorithm achieving faster weighted minhash computation with better scalability, improving upon existing algorithms like BagMinHash and ICWS.
Findings
Achieves expected time complexity faster than state-of-the-art algorithms.
Provides 10x speedups in common use cases.
Enables efficient approximate near neighbor search with optimal expected time.
Abstract
Weighted minwise hashing is a standard dimensionality reduction technique with applications to similarity search and large-scale kernel machines. We introduce a simple algorithm that takes a weighted set and computes independent minhashes in expected time , improving upon the state-of-the-art BagMinHash algorithm (KDD '18) and representing the fastest weighted minhash algorithm for sparse data. Our experiments show running times that scale better with and compared to ICWS (ICDM '10) and BagMinhash, obtaining x speedups in common use cases. Our approach also gives rise to a technique for computing fully independent locality-sensitive hash values for -parameterized approximate near neighbor search under weighted Jaccard similarity in optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Data Management and Algorithms
