NUMAscope: Capturing and Visualizing Hardware Metrics on Large ccNUMA Systems
Daniel J. Blueman (1), Foivos Zakkak (2), Christos Kotselidis (2) ((1), Numascale AS, (2) The University of Manchester)

TL;DR
NUMAscope is an open-source framework that captures and visualizes hardware metrics on large ccNUMA systems, aiding performance optimization by providing real-time and offline telemetry with low overhead.
Contribution
It introduces an extensible, open-source tool for capturing high-rate hardware metrics on ccNUMA systems with low overhead and versatile visualization options.
Findings
Low overhead (<10%) data collection on large ccNUMA systems
Supports real-time and offline analysis modes
Provides high-resolution visualizations via web and textual interfaces
Abstract
Cache-coherent non-uniform memory access (ccNUMA) systems enable parallel applications to scale-up to thousands of cores and many terabytes of main memory. However, since remote accesses come at an increased cost, extra measures are necessitated to scale the applications to high core-counts and process far greater amounts of data than a typical server can hold. In a similar manner to how applications are optimized to improve cache utilization, applications also need to be optimized to improve data-locality on ccNUMA systems to use larger topologies effectively. The first step to optimizing an application is to understand what slows it down. Consequently, profiling tools, or manual instrumentation, are necessary to achieve this. When optimizing applications on large ccNUMA systems, however, there are limited mechanisms to capture and present actionable telemetry. This is partially driven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management
