Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
Sudhakar Singh, Rakhi Garg, P. K. Mishra

TL;DR
This paper evaluates the performance of three data structures—hash tree, trie, and hash table trie—for the Apriori algorithm implemented on Hadoop MapReduce, demonstrating that hash table trie offers superior efficiency in big data mining tasks.
Contribution
It introduces and compares three data structure variations of Apriori on Hadoop MapReduce, highlighting the superior performance of hash table trie for large-scale frequent itemset mining.
Findings
Hash table trie outperforms trie and hash tree in execution time.
Hash tree performance degrades significantly on big datasets.
Experimental results confirm the efficiency of hash table trie in distributed data mining.
Abstract
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Artificial Intelligence in Healthcare · Machine Learning and Data Classification
