Technical Report: On the Usability of Hadoop MapReduce, Apache Spark & Apache Flink for Data Science
Bilal Akil, Ying Zhou, Uwe R\"ohm

TL;DR
This study compares Hadoop MapReduce, Apache Spark, and Apache Flink in terms of usability for data science, finding Spark and Flink more user-friendly than MapReduce, with no significant difference between Spark and Flink.
Contribution
It provides an empirical usability comparison of three major data processing platforms specifically for data science tasks.
Findings
Spark and Flink are preferred over MapReduce.
No significant difference in preference or development time between Spark and Flink.
The study explores factors influencing platform effectiveness for data science.
Abstract
Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of advanced dataflow oriented platforms, most prominently Apache Spark and Apache Flink. Those platforms not only aim to improve performance through improved in-memory processing, but in particular provide built-in high-level data processing functionality, such as filtering and join operators, which should make data analysis tasks easier to develop than with plain Hadoop MapReduce. But is this indeed the case? This paper compares three prominent distributed data processing platforms: Apache Hadoop MapReduce; Apache Spark; and Apache Flink, from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Big Data and Business Intelligence · Data Stream Mining Techniques
