Minwise-Independent Permutations with Insertion and Deletion of Features
Rameshwar Pratap, Raghav Kulkarni

TL;DR
This paper studies how to adapt minHash sketches efficiently when features are dynamically inserted or deleted, providing algorithms with theoretical guarantees and practical speed-ups for real-world datasets.
Contribution
It introduces algorithms for dynamic feature updates in minHash, a systematic study of their theoretical properties, and demonstrates practical efficiency and accuracy improvements.
Findings
Significant speed-up in updating minHash sketches with feature changes.
Algorithms maintain comparable accuracy to recomputing minHash from scratch.
The methods are efficient, accurate, and easy to implement.
Abstract
In their seminal work, Broder \textit{et. al.}~\citep{BroderCFM98} introduces the algorithm that computes a low-dimensional sketch of high-dimensional binary data that closely approximates pairwise Jaccard similarity. Since its invention, has been commonly used by practitioners in various big data applications. Further, the data is dynamic in many real-life scenarios, and their feature sets evolve over time. We consider the case when features are dynamically inserted and deleted in the dataset. We note that a naive solution to this problem is to repeatedly recompute with respect to the updated dimension. However, this is an expensive task as it requires generating fresh random permutations. To the best of our knowledge, no systematic study of is recorded in the context of dynamic insertion and deletion of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Algorithms and Data Compression · Data Management and Algorithms
