Frequent-Itemset Mining using Locality-Sensitive Hashing
Debajyoti Bera, Rameshwar Pratap

TL;DR
This paper explores the use of Locality-Sensitive Hashing (LSH) to improve frequent itemset mining by reducing I/O operations and candidate generation, proposing randomized Apriori variants based on LSH techniques.
Contribution
It introduces novel randomized Apriori algorithms utilizing asymmetric LSH over Hamming distance and Jaccard similarity to enhance efficiency.
Findings
Reduced I/O operations in frequent itemset mining
Lowered candidate generation through LSH-based methods
Maintained comparable accuracy with traditional Apriori
Abstract
The Apriori algorithm is a classical algorithm for the frequent itemset mining problem. A significant bottleneck in Apriori is the number of I/O operation involved, and the number of candidates it generates. We investigate the role of LSH techniques to overcome these problems, without adding much computational overhead. We propose randomized variations of Apriori that are based on asymmetric LSH defined over Hamming distance and Jaccard similarity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Algorithms and Data Compression
