A Study on the Resource Utilization and User Behavior on Titan Supercomputer
Sergio Iserte

TL;DR
This study analyzes Titan supercomputer's user behavior and resource utilization patterns using data science techniques to inform future exascale HPC system design.
Contribution
It introduces a comprehensive analysis of Titan's workload and resource usage, including a predictive model and methodology applicable to other HPC clusters.
Findings
Identified seasonal patterns in resource usage.
Developed a predictive model for resource utilization.
Revealed correlations between projects, jobs, and hardware components.
Abstract
Understanding HPC facilities users' behaviors and how computational resources are requested and utilized is not only crucial for the cluster productivity but also essential for designing and constructing future exascale HPC systems. This paper tackles Challenge 4, 'Analyzing Resource Utilization and User Behavior on Titan Supercomputer', of the 2021 Smoky Mountains Conference Data Challenge. Specifically, we dig deeper inside the records of Titan to discover patterns and extract relationships. This paper explores the workload distribution and usage patterns from resource manager system logs, GPU traces, and scientific areas information collected from the Titan supercomputer. Furthermore, we want to know how resource utilization and user behaviors change over time. Using data science methods, such as correlations, clustering, or neural networks, our findings allow us to investigate how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
