Parallel algorithms for mining of frequent itemsets

Robert Kessl

arXiv:2108.05038·cs.DB·August 12, 2021

Parallel algorithms for mining of frequent itemsets

Robert Kessl

PDF

Open Access

TL;DR

This paper presents a parallel algorithm for mining frequent itemsets that achieves significant speedup on distributed memory systems by estimating processor load from samples, applicable to arbitrary depth-first search algorithms.

Contribution

It introduces a novel parallel method for frequent itemset mining that improves speedup and is adaptable to existing sequential algorithms on distributed systems.

Findings

01

Achieves ~6x speedup on 10 processors

02

Uses approximate load estimation from database samples

03

Ensures complete frequent itemset discovery from the full database

Abstract

In the recent decade companies started collecting of large amount of data. Without a proper analyse, the data are usually useless. The field of analysing the data is called data mining. Unfortunately, the amount of data is quite large: the data do not fit into main memory and the processing time can become quite huge. Therefore, we need parallel data mining algorithms. One of the popular and important data mining algorithm is the algorithm for generation of so called frequent itemsets. The problem of mining of frequent itemsets can be explained on the following example: customers goes in a store put into theirs baskets some goods; the owner of the store collects the baskets and wants to know the set of goods that are bought together in at least p% of the baskets. Currently, the sequential algorithms for mining of frequent itemsets are quite good in the means of performance. However, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Algorithms and Data Compression · Data Management and Algorithms