Feature Selection for Learning to Predict Outcomes of Compute Cluster Jobs with Application to Decision Support
Adedolapo Okanlawon, Huichen Yang, Avishek Bose, William Hsu, Dan, Andresen, Mohammed Tanash

TL;DR
This paper introduces a machine learning framework with feature selection techniques to predict HPC job outcomes, aiding decision support for resubmission or migration, achieving high accuracy and interpretability.
Contribution
It presents a novel approach combining feature selection and supervised learning on HPC workload data for outcome prediction and decision support.
Findings
Achieved 95% R^2 and 99% accuracy in predictions.
Identified five key predictors for CPU and memory.
Demonstrated effective feature selection methods for HPC data.
Abstract
We present a machine learning framework and a new test bed for data mining from the Slurm Workload Manager for high-performance computing (HPC) clusters. The focus was to find a method for selecting features to support decisions: helping users decide whether to resubmit failed jobs with boosted CPU and memory allocations or migrate them to a computing cloud. This task was cast as both supervised classification and regression learning, specifically, sequential problem solving suitable for reinforcement learning. Selecting relevant features can improve training accuracy, reduce training time, and produce a more comprehensible model, with an intelligent system that can explain predictions and inferences. We present a supervised learning model trained on a Simple Linux Utility for Resource Management (Slurm) data set of HPC jobs using three different techniques for selecting features:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
