Analyzing Performance Properties Collected by the PerSyst Scalable HPC   Monitoring Tool

David Brayford; Christoph Bernau; Wolfram Hesse; Carla Guillen

arXiv:2009.06061·cs.DC·September 15, 2020·1 cites

Analyzing Performance Properties Collected by the PerSyst Scalable HPC Monitoring Tool

David Brayford, Christoph Bernau, Wolfram Hesse, Carla Guillen

PDF

Open Access

TL;DR

This paper explores how system performance data from the PerSyst HPC monitoring tool can be used to analyze execution patterns, optimize code, improve monitoring, and predict scientific code performance using machine learning.

Contribution

It introduces methods for analyzing HPC performance data to identify execution patterns, optimize code, enhance monitoring, and apply machine learning for performance prediction.

Findings

01

Identified key execution patterns in HPC applications.

02

Proposed optimization strategies based on performance analysis.

03

Suggested machine learning models for performance prediction.

Abstract

The ability to understand how a scientific application is executed on a large HPC system is of great importance in allocating resources within the HPC data center. In this paper, we describe how we used system performance data to identify: execution patterns, possible code optimizations and improvements to the system monitoring. We also identify candidates for employing machine learning techniques to predict the performance of similar scientific codes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Scientific Computing and Data Management · Cloud Computing and Resource Management