Performance Optimization of MapReduce-based Apriori Algorithm on Hadoop   Cluster

Sudhakar Singh; Rakhi Garg; P K Mishra

arXiv:1807.06070·cs.DC·July 18, 2018

Performance Optimization of MapReduce-based Apriori Algorithm on Hadoop Cluster

Sudhakar Singh, Rakhi Garg, P K Mishra

PDF

Open Access

TL;DR

This paper introduces improved MapReduce-based Apriori algorithms that optimize performance by combining multiple passes and selectively skipping pruning, resulting in faster execution on Hadoop clusters.

Contribution

It proposes VFPC and ETDPC algorithms with optimized versions that enhance efficiency and robustness over existing combined-counting methods.

Findings

01

Optimized algorithms reduce execution time significantly.

02

Skipping pruning in some passes decreases overall computation.

03

Proposed methods outperform traditional approaches in experiments.

Abstract

Many techniques have been proposed to implement the Apriori algorithm on MapReduce framework but only a few have focused on performance improvement. FPC (Fixed Passes Combined-counting) and DPC (Dynamic Passes Combined-counting) algorithms combine multiple passes of Apriori in a single MapReduce phase to reduce the execution time. In this paper, we propose improved MapReduce based Apriori algorithms VFPC (Variable Size based Fixed Passes Combined-counting) and ETDPC (Elapsed Time based Dynamic Passes Combined-counting) over FPC and DPC. Further, we optimize the multi-pass phases of these algorithms by skipping pruning step in some passes, and propose Optimized-VFPC and Optimized-ETDPC algorithms. Quantitative analysis reveals that counting cost of additional un-pruned candidates produced due to skipped-pruning is less significant than reduction in computation cost due to the same.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Cloud Computing and Resource Management · Data Management and Algorithms