The MIT Supercloud Workload Classification Challenge
Benny J. Tang, Qiqi Chen, Matthew L. Weiss, Nathan Frey, Joseph, McDonald, David Bestor, Charles Yee, William Arcand, Chansup Byun, Daniel, Edelman, Matthew Hubbell, Michael Jones, Jeremy Kepner, Anna Klein, Adam, Michaleas, Peter Michaleas, Lauren Milechin, Julia Mullen

TL;DR
This paper introduces a workload classification challenge using the MIT Supercloud Dataset to improve AI and ML workload identification for better resource management in HPC and cloud environments.
Contribution
It provides a labeled dataset and initial results to foster new algorithms for workload classification in heterogeneous datacenter environments.
Findings
Initial classification results demonstrate potential for improved accuracy.
The dataset enables development of AI-based workload identification methods.
Public availability of data and code supports further research.
Abstract
High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly larger share of the compute workloads, new approaches to optimized resource usage, allocation, and deployment of new AI frameworks are needed. By identifying compute workloads and their utilization characteristics, HPC systems may be able to better match available resources with the application demand. By leveraging datacenter instrumentation, it may be possible to develop AI-based approaches that can identify workloads and provide feedback to researchers and datacenter operators for improving operational efficiency. To enable this research, we released the MIT Supercloud Dataset, which provides detailed monitoring logs from the MIT Supercloud cluster.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
