Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE
Samuel Miksits, Ruimin Shi, Maya Gokhale, Jacob Wahlgren, Gabin, Schieffer, Ivy Peng

TL;DR
This paper introduces a multi-level memory profiling tool for ARM processors using the ARM SPE extension, providing insights into profiling overhead and accuracy for HPC and cloud workloads.
Contribution
It presents the first quantitative evaluation of ARM SPE-based memory profiling, analyzing overhead and accuracy across different sampling configurations.
Findings
ARM SPE enables effective memory profiling on ARM processors.
Profiling overhead varies with sampling period and buffer size.
The tool provides accurate insights into memory bottlenecks in HPC and cloud workloads.
Abstract
High-end ARM processors are emerging in data centers and HPC systems, posing as a strong contender to x86 machines. Memory-centric profiling is an important approach for dissecting an application's bottlenecks on memory access and guiding optimizations. Many existing memory profiling tools leverage hardware performance counters and precise event sampling, such as Intel PEBS and AMD IBS, to achieve high accuracy and low overhead. In this work, we present a multi-level memory profiling tool for ARM processors, leveraging Statistical Profiling Extension (SPE). We evaluate the tool using both HPC and Cloud workloads on the ARM Ampere processor. Our results provide the first quantitative assessment of time overhead and sampling accuracy of ARM SPE for memory-centric profiling at different sampling periods and aux buffer sizes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · CCD and CMOS Imaging Sensors
