Machine Learning Framwork for Performance Anomaly in OpenMP Multi-Threaded Systems
Weidong Wang, Wangda Luo

TL;DR
This paper introduces a machine learning framework designed to detect performance anomalies in OpenMP multi-threaded systems, addressing challenges of resource contention and noise in dynamic monitoring data.
Contribution
The paper presents a novel machine learning framework specifically tailored for identifying performance anomalies in OpenMP multi-threaded applications.
Findings
Successfully detects 90.3% of injected anomalies
Validated on NAS Parallel NPB, EPCC micro-benchmark, and Jacobi benchmarks
Improves system resiliency by early anomaly detection
Abstract
Some OpenMP multi-threaded applications increasingly suffer from performance anomaly owning to shared resource contention as well as software- and hardware-related problems. Such performance anomaly can result in failure and inefficiencies, and are among the main challenges in system resiliency. To minimize the impact of performance anomaly, one must quickly and accurately detect and diagnose the performance anomalies that cause the failures. However, it is difficult to identify anomalies in the dynamic and noisy data collected by OpenMP multi-threaded monitoring infrastructures. This paper presents a novel machine learning framework for performance anomaly in OpenMP multi-threaded systems. To evaluate our framework, the NAS Parallel NPB benchmark, EPCC OpenMP micro-benchmark suite, and Jacobi benchmark are used to test the performance of our framework proposed. The experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications · Cloud Computing and Resource Management
