Analyzing Performance Properties Collected by the PerSyst Scalable HPC Monitoring Tool
David Brayford, Christoph Bernau, Wolfram Hesse, Carla Guillen

TL;DR
This paper explores how system performance data from the PerSyst HPC monitoring tool can be used to analyze execution patterns, optimize code, improve monitoring, and predict scientific code performance using machine learning.
Contribution
It introduces methods for analyzing HPC performance data to identify execution patterns, optimize code, enhance monitoring, and apply machine learning for performance prediction.
Findings
Identified key execution patterns in HPC applications.
Proposed optimization strategies based on performance analysis.
Suggested machine learning models for performance prediction.
Abstract
The ability to understand how a scientific application is executed on a large HPC system is of great importance in allocating resources within the HPC data center. In this paper, we describe how we used system performance data to identify: execution patterns, possible code optimizations and improvements to the system monitoring. We also identify candidates for employing machine learning techniques to predict the performance of similar scientific codes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Scientific Computing and Data Management · Cloud Computing and Resource Management
