Analyzing Resource Utilization in an HPC System: A Case Study of NERSC Perlmutter
Jie Li, George Michelogiannakis, Brandon Cook, Dulanya Cooray, Yong, Chen

TL;DR
This study analyzes resource utilization in NERSC's Perlmutter HPC system, revealing significant underutilization of CPU and GPU resources, and highlights the need for more fine-grained resource allocation policies.
Contribution
It provides an early, detailed analysis of resource usage patterns in Perlmutter, emphasizing the potential for improved resource management strategies.
Findings
CPUs are often underutilized, especially in GPU jobs
Approximately 64% of jobs used 50% or less of memory
Half of GPU jobs used up to 25% of GPU memory
Abstract
Resource demands of HPC applications vary significantly. However, it is common for HPC systems to primarily assign resources on a per-node basis to prevent interference from co-located workloads. This gap between the coarse-grained resource allocation and the varying resource demands can lead to HPC resources being not fully utilized. In this study, we analyze the resource usage and application behavior of NERSC's Perlmutter, a state-of-the-art open-science HPC system with both CPU-only and GPU-accelerated nodes. Our one-month usage analysis reveals that CPUs are commonly not fully utilized, especially for GPU-enabled jobs. Also, around 64% of both CPU and GPU-enabled jobs used 50% or less of the available host memory capacity. Additionally, about 50% of GPU-enabled jobs used up to 25% of the GPU memory, and the memory capacity was not fully utilized in some ways for all jobs. While our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
