Evaluation of Frequent Itemset Mining Platforms using Apriori and FP-Growth Algorithm
Ravi Ranjan, Aditi Sharma

TL;DR
This paper compares Hadoop, Spark, and Flink platforms in executing Apriori and FP-Growth algorithms for frequent itemset mining across various dataset scales to guide software selection for big data analysis.
Contribution
It provides an empirical evaluation of popular big data platforms using two key algorithms, highlighting their performance differences for practical decision-making.
Findings
Hadoop, Spark, and Flink show varying performance depending on dataset size.
Spark generally outperforms Hadoop and Flink in execution speed.
FP-Growth is faster than Apriori across all platforms.
Abstract
With the overwhelming amount of complex and heterogeneous data pouring from any-where, any-time, and any-device, there is undeniably an era of Big Data. The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited budget and under high rates of data generation. Companies are recognizing that big data can be used to make more accurate predictions, and can be used to enhance the business with the help of appropriate association rule mining algorithm. To help these organizations, with which software and algorithm is more appropriate for them depending on their dataset, we compared the most famous three MapReduce based software Hadoop, Spark, Flink on two widely used algorithms Apriori and Fp-Growth on different scales of dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Imbalanced Data Classification Techniques · Artificial Intelligence in Healthcare
