Virtualizing the Stampede2 Supercomputer with Applications to HPC in the Cloud
W. Cyrus Proctor, Mike Packard, Anagha Jamthe, Richard Cardone, Joseph, Stubbs

TL;DR
This paper presents a method to quickly create and scale a virtual HPC cluster mimicking Stampede2 on the cloud, enabling faster scientific computations and testing without impacting the original supercomputer.
Contribution
The authors develop a rapid deployment approach for an elastic virtual cluster that emulates Stampede2's environment, facilitating cloud bursting and testing.
Findings
Virtual cluster can be built in minutes on Jetstream
Performance on virtual cluster is comparable to Stampede2 for key applications
Virtual cluster reduces queue wait times and aids debugging
Abstract
Methods developed at the Texas Advanced Computing Center (TACC) are described and demonstrated for automating the construction of an elastic, virtual cluster emulating the Stampede2 high performance computing (HPC) system. The cluster can be built and/or scaled in a matter of minutes on the Jetstream self-service cloud system and shares many properties of the original Stampede2, including: i) common identity management, ii) access to the same file systems, iii) equivalent software application stack and module system, iv) similar job scheduling interface via Slurm. We measure time-to-solution for a number of common scientific applications on our virtual cluster against equivalent runs on Stampede2 and develop an application profile where performance is similar or otherwise acceptable. For such applications, the virtual cluster provides an effective form of "cloud bursting" with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
