Alchemist: An Apache Spark <=> MPI Interface
Alex Gittens, Kai Rothauge, Shusen Wang, Michael W. Mahoney, Jey, Kottalam, Lisa Gerhardt, Prabhat, Michael Ringenburg, Kristyn Maschhoff

TL;DR
Alchemist enables seamless integration of MPI libraries into Spark applications, significantly improving performance for linear algebra tasks without sacrificing Spark's ease of use.
Contribution
This work introduces Alchemist, a system that allows Spark to efficiently call MPI-based libraries, reducing overheads and enhancing performance for large-scale data computations.
Findings
Alchemist reduces computation time for matrix operations.
Performance improvements are demonstrated on the NERSC Cori supercomputer.
Alchemist maintains Spark's productivity benefits while enhancing efficiency.
Abstract
The Apache Spark framework for distributed computation is popular in the data analytics community due to its ease of use, but its MapReduce-style programming model can incur significant overheads when performing computations that do not map directly onto this model. One way to mitigate these costs is to off-load computations onto MPI codes. In recent work, we introduced Alchemist, a system for the analysis of large-scale data sets. Alchemist calls MPI-based libraries from within Spark applications, and it has minimal coding, communication, and memory overheads. In particular, Alchemist allows users to retain the productivity benefits of working within the Spark software ecosystem without sacrificing performance efficiency in linear algebra, machine learning, and other related computations. In this paper, we discuss the motivation behind the development of Alchemist, and we provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques · Advanced Data Storage Technologies
