Usability Evaluation of Cloud for HPC Applications
Vanessa Sochat, Daniel Milroy, Abhik Sarkar, Aniruddha Marathe, Tapasya Patki

TL;DR
This paper evaluates the usability of cloud platforms for HPC applications by testing 11 applications across multiple clouds and configurations, providing insights and best practices for cloud-based HPC workloads.
Contribution
It offers a comprehensive cross-platform usability study of HPC applications on major cloud providers, including scaling tests and methodological guidance.
Findings
Cloud environments can scale HPC applications up to 28,672 CPUs and 256 GPUs.
Performance and usability vary across cloud providers and configurations.
The study establishes a foundation for best practices in cloud-based HPC computing.
Abstract
The rise of AI and the economic dominance of cloud computing have created a new nexus of innovation for high performance computing (HPC), which has a long history of driving scientific discovery. In addition to performance needs, scientific workflows increasingly demand capabilities of cloud environments: portability, reproducibility, dynamism, and automation. As converged cloud environments emerge, there is growing need to study their fit for HPC use cases. Here we present a cross-platform usability study that assesses 11 different HPC proxy applications and benchmarks across three clouds (Microsoft Azure, Amazon Web Services, and Google Cloud), six environments, and two compute configurations (CPU and GPU) against on-premises HPC clusters at a major center. We perform scaling tests of applications in all environments up to 28,672 CPUs and 256 GPUs. We present methodology and results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Scientific Computing and Data Management · Distributed and Parallel Computing Systems
