Simplifying HPC resource selection: A tool for optimizing execution time and cost on Azure
Marco A. S. Netto, Wolfgang De Savador, Davide Vanzo

TL;DR
This paper introduces an open-source tool that simplifies HPC resource selection on Azure by automating benchmarking and providing recommendations to optimize execution time and costs, using data analytics and optimization techniques.
Contribution
The paper presents a novel tool that automates HPC resource configuration and provides insights to optimize performance and costs on Azure cloud, reducing manual effort and benchmarking runs.
Findings
Reduced number of cloud executions needed for guidance
Effective use of data analytics and optimization techniques
Initial results show promising resource selection recommendations
Abstract
Azure Cloud offers a wide range of resources for running HPC workloads, requiring users to configure their deployment by selecting VM types, number of VMs, and processes per VM. Suboptimal decisions may lead to longer execution times or additional costs for the user. We are developing an open-source tool to assist users in making these decisions by considering application input parameters, as they influence resource consumption. The tool automates the time-consuming process of setting up the cloud environment, executing the benchmarking runs, handling output, and providing users with resource selection recommendations as high level insights on run times and costs across different VM types and number of VMs. In this work, we present initial results and insights on reducing the number of cloud executions needed to provide such guidance, leveraging data analytics and optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Scientific Computing and Data Management
