Minos: Systematically Classifying Performance and Power Characteristics of GPU Workloads on HPC Clusters
Rutwik Jain, Yiwei Jiang, Matthew D. Sinclair, Shivaram Venkataraman

TL;DR
Minos is a classification system that groups GPU workloads based on power and performance characteristics, enabling efficient profiling and accurate predictions in HPC clusters.
Contribution
It introduces a low-cost profiling method to classify workloads, reducing profiling time and improving power and performance prediction accuracy.
Findings
Reduces profiling time for unseen applications by 89%
Achieves 4% mean error in power predictions
Improves prediction accuracy over state-of-the-art by 10%
Abstract
As large-scale HPC compute clusters increasingly adopt accelerators such as GPUs to meet the voracious demands of modern workloads, these clusters are increasingly becoming power constrained. Unfortunately, modern applications can often temporarily exceed the power ratings of the accelerators ("power spikes"). Thus, current and future HPC systems must optimize for both power and performance together. However, this is made difficult by increasingly diverse applications, which often require bespoke optimizations to run efficiently on each cluster. Traditionally researchers overcome this problem by profiling applications on specific clusters and optimizing, but the scale, algorithmic diversity, and lack of effective tools make this challenging. To overcome these inefficiencies, we propose Minos, a systematic classification mechanism that identifies similar application characteristics via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
