Automatically Leveraging MapReduce Frameworks for Data-Intensive   Applications

Maaz Bin Safeer Ahmad; Alvin Cheung

arXiv:1801.09802·cs.DB·June 20, 2018

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

Maaz Bin Safeer Ahmad, Alvin Cheung

PDF

TL;DR

Casper is an automated tool that translates sequential Java programs into MapReduce frameworks, enabling data-intensive applications to benefit from distributed processing without manual rewriting.

Contribution

It introduces a novel automated translation approach using program synthesis and theorem proving to convert sequential code into MapReduce paradigms.

Findings

01

Benchmarks achieved up to 48.2x speedup

02

Automated translation reduces manual effort

03

Supports Hadoop, Spark, and Flink APIs

Abstract

MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become familiar with their APIs and rewrite existing code. Casper is a new tool that automatically translates sequential Java programs into the MapReduce paradigm. Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm and verified to be semantically equivalent to the original using a theorem prover. (2) Casper generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.