A Comparison of Big Data Frameworks on a Layered Dataflow Model

Claudia Misale; Maurizio Drocco; Marco Aldinucci; Guy; Tremblay

arXiv:1606.05293·cs.DC·June 17, 2016

A Comparison of Big Data Frameworks on a Layered Dataflow Model

Claudia Misale, Maurizio Drocco, Marco Aldinucci, Guy, Tremblay

PDF

Open Access

TL;DR

This paper compares various Big Data frameworks using a layered Dataflow model, demonstrating their expressiveness and providing a unified understanding of their capabilities and abstractions.

Contribution

It introduces a layered Dataflow model that captures the expressiveness of existing frameworks like Spark, Flink, and Storm, clarifying their relationships and capabilities.

Findings

01

The model is as general as existing frameworks.

02

It unifies understanding of batch and streaming tools.

03

Tools fit into the layered Dataflow structure.

Abstract

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Business Process Modeling and Analysis