Running Alchemist on Cray XC and CS Series Supercomputers: Dask and PySpark Interfaces, Deployment Options, and Data Transfer Times
Kai Rothauge, Haripriya Ayyalasomayajula, Kristyn J. Maschhoff,, Michael Ringenburg, Michael W. Mahoney

TL;DR
This paper discusses recent developments in Alchemist, a system that enhances Spark performance via HPC library integration, focusing on deployment on Cray supercomputers, new Python interfaces, and data transfer overheads.
Contribution
It introduces new Python, Dask, and PySpark interfaces for Alchemist, details deployment on Cray supercomputers, and assesses data transfer times impacting performance.
Findings
Successful deployment of Alchemist on Cray supercomputers using containerization.
New interfaces enable Alchemist to work with Dask and PySpark.
Data transfer times are a significant factor affecting overall performance.
Abstract
Alchemist is a system that allows Apache Spark to achieve better performance by interfacing with HPC libraries for large-scale distributed computations. In this paper, we highlight some recent developments in Alchemist that are of interest to Cray users and the scientific community in general. We discuss our experience porting Alchemist to container images and deploying it on Cray XC (using Shifter) and CS (using Singularity) series supercomputers and on a local Kubernetes cluster. Newly developed interfaces for Python, Dask, and PySpark enable the use of Alchemist with additional data analysis frameworks. We also briefly discuss the combination of Alchemist with RLlib, an increasingly popular library for reinforcement learning, and consider the benefits of leveraging HPC simulations in reinforcement learning. Finally, since data transfer between the client applications and Alchemist…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Scientific Computing and Data Management · Parallel Computing and Optimization Techniques
