Parallel algorithms for mining of frequent itemsets
Robert Kessl

TL;DR
This paper presents a parallel algorithm for mining frequent itemsets that achieves significant speedup on distributed memory systems by estimating processor load from samples, applicable to arbitrary depth-first search algorithms.
Contribution
It introduces a novel parallel method for frequent itemset mining that improves speedup and is adaptable to existing sequential algorithms on distributed systems.
Findings
Achieves ~6x speedup on 10 processors
Uses approximate load estimation from database samples
Ensures complete frequent itemset discovery from the full database
Abstract
In the recent decade companies started collecting of large amount of data. Without a proper analyse, the data are usually useless. The field of analysing the data is called data mining. Unfortunately, the amount of data is quite large: the data do not fit into main memory and the processing time can become quite huge. Therefore, we need parallel data mining algorithms. One of the popular and important data mining algorithm is the algorithm for generation of so called frequent itemsets. The problem of mining of frequent itemsets can be explained on the following example: customers goes in a store put into theirs baskets some goods; the owner of the store collects the baskets and wants to know the set of goods that are bought together in at least p% of the baskets. Currently, the sequential algorithms for mining of frequent itemsets are quite good in the means of performance. However, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Algorithms and Data Compression · Data Management and Algorithms
