Modeling memory bandwidth patterns on NUMA machines with performance counters
Daniel Goodman, Roni Haecki, Tim Harris

TL;DR
This paper presents a model that predicts memory bandwidth requirements on NUMA systems based on thread placement, aiding performance optimization and debugging in data analytics applications.
Contribution
It introduces a thread placement-based bandwidth modeling approach using performance counters, with high prediction accuracy demonstrated through extensive measurements.
Findings
Median prediction error of 2.34% in bandwidth
Model enables performance debugging and system load prediction
Applicable to optimizing thread and memory placement strategies
Abstract
Computers used for data analytics are often NUMA systems with multiple sockets per machine, multiple cores per socket, and multiple thread contexts per core. To get the peak performance out of these machines requires the correct number of threads to be placed in the correct positions on the machine. One particularly interesting element of the placement of memory and threads is the way it effects the movement of data around the machine, and the increased latency this can introduce to reads and writes. In this paper we describe work on modeling the bandwidth requirements of an application on a NUMA compute node based on the placement of threads. The model is parameterized by sampling performance counters during 2 application runs with carefully chosen thread placements. Evaluating the model with thousands of measurements shows a median difference from predictions of 2.34% of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems
