Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
Maaz Bin Safeer Ahmad, Alvin Cheung

TL;DR
Casper is an automated tool that translates sequential Java programs into MapReduce frameworks, enabling data-intensive applications to benefit from distributed processing without manual rewriting.
Contribution
It introduces a novel automated translation approach using program synthesis and theorem proving to convert sequential code into MapReduce paradigms.
Findings
Benchmarks achieved up to 48.2x speedup
Automated translation reduces manual effort
Supports Hadoop, Spark, and Flink APIs
Abstract
MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become familiar with their APIs and rewrite existing code. Casper is a new tool that automatically translates sequential Java programs into the MapReduce paradigm. Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm and verified to be semantically equivalent to the original using a theorem prover. (2) Casper generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
