Leveraging Parallel Data Processing Frameworks with Verified Lifting

Maaz Bin Safeer Ahmad; Alvin Cheung

arXiv:1611.07623·cs.PL·November 24, 2016·SYNT@CAV

Leveraging Parallel Data Processing Frameworks with Verified Lifting

Maaz Bin Safeer Ahmad, Alvin Cheung

PDF

TL;DR

Casper is a compiler that automatically converts sequential Java code into Hadoop MapReduce jobs, achieving significant speedups and better scalability without manual rewriting.

Contribution

It introduces verified lifting to automatically retarget sequential Java code for Hadoop, simplifying development and optimizing performance.

Findings

01

Average 3.3x speedup over sequential Java implementations

02

Automatically translates Java benchmarks into Hadoop

03

Improved scalability to larger datasets

Abstract

Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting-tedious and error-prone-also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes Casper, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, Casper uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that Casper automatically translates Java benchmarks into Hadoop. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.