LLload: An Easy-to-Use HPC Utilization Tool
Chansup Byun, Albert Reuther, Julie Mullen, LaToya Anderson, William, Arcand, Bill Bergeron, David Bestor, Alexander Bonn, Daniel Burrill, Vijay, Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones,, Piotr Luszczek, Peter Michaleas, Lauren Milechin

TL;DR
LLload is a user-friendly HPC tool that helps monitor and optimize resource utilization, leading to improved efficiency and throughput in supercomputing environments.
Contribution
The paper introduces LLload, a simple command-line tool for monitoring and characterizing HPC workloads to enhance resource utilization and performance.
Findings
Significant improvement in GPU utilization observed.
Enhanced throughput performance with GPU overloading.
Better workload management through LLload insights.
Abstract
The increasing use and cost of high performance computing (HPC) requires new easy-to-use tools to enable HPC users and HPC systems engineers to transparently understand the utilization of resources. The MIT Lincoln Laboratory Supercomputing Center (LLSC) has developed a simple command, LLload, to monitor and characterize HPC workloads. LLload plays an important role in identifying opportunities for better utilization of compute resources. LLload can be used to monitor jobs both programmatically and interactively. LLload can characterize users' jobs using various LLload options to achieve better efficiency. This information can be used to inform the user to optimize HPC workloads and improve both CPU and GPU utilization. This includes improvements using judicious oversubscription of the computing resources. Preliminary results suggest significant improvement in GPU utilization and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies
