ScALPEL: A Scalable Adaptive Lightweight Performance Evaluation Library for application performance monitoring
Hari K. Pyla, Bharath Ramesh, Calvin J. Ribbens, Srinidhi, Varadarajan

TL;DR
ScALPEL is a portable, low-overhead, runtime-configurable library for application performance monitoring that supports large-scale distributed systems without source code modifications.
Contribution
It introduces a scalable, adaptive, lightweight performance evaluation library that extends existing frameworks to enable efficient, function-level monitoring in distributed environments.
Findings
Supports dynamic function selection and event monitoring
Achieves low runtime overhead suitable for production use
Compatible with existing performance tools like Perfmon and PAPI
Abstract
As supercomputers continue to grow in scale and capabilities, it is becoming increasingly difficult to isolate processor and system level causes of performance degradation. Over the last several years, a significant number of performance analysis and monitoring tools have been built/proposed. However, these tools suffer from several important shortcomings, particularly in distributed environments. In this paper we present ScALPEL, a Scalable Adaptive Lightweight Performance Evaluation Library for application performance monitoring at the functional level. Our approach provides several distinct advantages. First, ScALPEL is portable across a wide variety of architectures, and its ability to selectively monitor functions presents low run-time overhead, enabling its use for large-scale production applications. Second, it is run-time configurable, enabling both dynamic selection of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Software System Performance and Reliability
