Job Placement Advisor Based on Turnaround Predictions for HPC Hybrid Clouds
Renato L. F. Cunha, Eduardo R. Rodrigues, Leonardo P. Tizzei, Marco A., S. Netto (IBM Research)

TL;DR
This paper presents a job placement tool for HPC hybrid clouds that accounts for prediction inaccuracies in execution and wait times, improving decision-making and reducing turnaround times.
Contribution
It introduces a novel decision-making tool that considers prediction errors and extends machine learning predictors with scheduler data for better accuracy.
Findings
Predictions should sometimes be disregarded for workload efficiency.
Scheduler data significantly improves machine learning prediction accuracy.
A 20% improvement in prediction accuracy was achieved.
Abstract
Several companies and research institutes are moving their CPU-intensive applications to hybrid High Performance Computing (HPC) cloud environments. Such a shift depends on the creation of software systems that help users decide where a job should be placed considering execution time and queue wait time to access on-premise clusters. Relying blindly on turnaround prediction techniques will affect negatively response times inside HPC cloud environments. This paper introduces a tool to make job placement decisions in HPC hybrid cloud environments taking into account the inaccuracy of execution and waiting time predictions. We used job traces from real supercomputing centers to run our experiments, and compared the performance between environments using real speedup curves. We also extended a state-of-the-art machine learning based predictor to work with data from the cluster scheduler.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
