HPAT: High Performance Analytics with Scripting Ease-of-Use
Ehsan Totoni, Todd A. Anderson, Tatiana Shpeisman

TL;DR
HPAT is a compiler-based toolkit that automatically parallelizes high-level Julia scripts for big data analytics, achieving significant speedups over Spark on supercomputers and cloud platforms.
Contribution
It introduces a novel auto-parallelizing compiler approach tailored for data analytics, enabling automatic code generation and optimization for high-performance computing environments.
Findings
HPAT achieves up to 2033x speedup over Spark on supercomputers.
HPAT is 20x to 256x faster than Spark on Amazon AWS.
The approach provides robustness and automatic optimizations like array operation fusion.
Abstract
Big data analytics requires high programmer productivity and high performance simultaneously on large-scale clusters. However, current big data analytics frameworks (e.g. Apache Spark) have prohibitive runtime overheads since they are library-based. We introduce a novel auto-parallelizing compiler approach that exploits the characteristics of the data analytics domain such as the map/reduce parallel pattern and is robust, unlike previous auto-parallelization methods. Using this approach, we build High Performance Analytics Toolkit (HPAT), which parallelizes high-level scripting (Julia) programs automatically, generates efficient MPI/C++ code, and provides resiliency. Furthermore, it provides automatic optimizations for scripting programs, such as fusion of array operations. Thus, HPAT is 369x to 2033x faster than Spark on the Cori supercomputer and 20x to 256x times on Amazon AWS.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Data Storage Technologies
