A novel approach for fast mining frequent itemsets use N-list structure   based on MapReduce

Arkan A. G. Al-Hamodi; Songfeng Lu

arXiv:1704.04599·cs.DC·May 23, 2017·1 cites

A novel approach for fast mining frequent itemsets use N-list structure based on MapReduce

Arkan A. G. Al-Hamodi, Songfeng Lu

PDF

Open Access

TL;DR

This paper introduces HPrepost, a MapReduce-based algorithm that efficiently mines frequent itemsets with reduced runtime and memory, especially effective on dense, large datasets.

Contribution

It presents an improved Prepost algorithm utilizing Hadoop and MapReduce for faster, more memory-efficient frequent itemset mining on large datasets.

Findings

01

HPrepost outperforms existing algorithms in runtime.

02

HPrepost uses less memory on dense datasets.

03

Effective with large datasets and small support thresholds.

Abstract

Frequent Pattern Mining is a one field of the most significant topics in data mining. In recent years, many algorithms have been proposed for mining frequent itemsets. A new algorithm has been presented for mining frequent itemsets based on N-list data structure called Prepost algorithm. The Prepost algorithm is enhanced by implementing compact PPC-tree with the general tree. Prepost algorithm can only find a frequent itemsets with required (pre-order and post-order) for each node. In this chapter, we improved prepost algorithm based on Hadoop platform (HPrepost), proposed using the Mapreduce programming model. The main goals of proposed method are efficient mining frequent itemsets requiring less running time and memory usage. We have conduct experiments for the proposed scheme to compare with another algorithms. With dense datasets, which have a large average length of transactions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Rough Sets and Fuzzy Logic