Benchmarking Apache Spark and Hadoop MapReduce on Big Data   Classification

Taha Tekdogan; Ali Cakmak

arXiv:2209.10637·cs.DC·September 23, 2022

Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification

Taha Tekdogan, Ali Cakmak

PDF

1 Repo

TL;DR

This paper compares Apache Spark and Hadoop MapReduce for classification tasks in Big Data, evaluating performance, accuracy, and scalability to guide tool selection.

Contribution

It provides a comprehensive benchmark using multiple metrics, including execution time, accuracy, and scalability, which is novel in considering task-specific performance.

Findings

01

Spark is 5 times faster than MapReduce in training.

02

Spark's performance degrades with larger input workloads.

03

MapReduce achieves slightly better accuracy (~3%) than Spark.

Abstract

Most of the popular Big Data analytics tools evolved to adapt their working environment to extract valuable information from a vast amount of unstructured data. The ability of data mining techniques to filter this helpful information from Big Data led to the term Big Data Mining. Shifting the scope of data from small-size, structured, and stable data to huge volume, unstructured, and quickly changing data brings many data management challenges. Different tools cope with these challenges in their own way due to their architectural limitations. There are numerous parameters to take into consideration when choosing the right data management framework based on the task at hand. In this paper, we present a comprehensive benchmark for two widely used Big Data analytics tools, namely Apache Spark and Hadoop MapReduce, on a common data mining task, i.e., classification. We employ several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tekdogan/iccbdc-21
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.