Towards a Comprehensive Framework for Telemetry Data in HPC Environments
Ole Weidner, Malcolm Atkinson, Adam Barker

TL;DR
This paper proposes a comprehensive framework and conceptual model for managing telemetry data in HPC environments to improve application development, portability, and adaptive strategies.
Contribution
It introduces a new software framework and conceptual model for telemetry data management in HPC, enabling better integration and adaptive application strategies.
Findings
Framework effectively collects and analyzes telemetry data
Integration with HPC architectures improves application adaptability
Supports development of portable and adaptive HPC applications
Abstract
Current HPC platforms do not provide the infrastructure, interfaces and conceptual models to collect, store, analyze, and access such data. Today, applications depend on application and platform specific techniques for collecting telemetry data; introducing significant development overheads that inhibit portability and mobility. The development and adoption of adaptive, context-aware strategies is thereby impaired. To facilitate 2nd generation applications, more efficient application development, and swift adoption of adaptive applications in production, a comprehensive framework for telemetry data management must be provided by future HPC systems and services. We introduce a conceptual model and a software framework to collect, store, analyze, and exploit streams of telemetry data generated by HPC systems and their applications. We show how this framework can be integrated with HPC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Parallel Computing and Optimization Techniques
