Efficient Characterization of Hidden Processor Memory Hierarchies
Keith Cooper, Xiaoran Xu

TL;DR
This paper introduces a set of portable, efficient tools that quickly characterize hidden processor memory hierarchies, aiding performance optimization and understanding in cloud and cluster environments.
Contribution
The paper presents novel, fast, and portable tools that automatically derive key parameters of processor memory hierarchies from simple experiments.
Findings
Tools can derive cache levels, capacities, and latencies in seconds.
Tools are portable and suitable for various deployment contexts.
Automatic analysis of cache response curves enhances performance understanding.
Abstract
A processor's memory hierarchy has a major impact on the performance of running code. However, computing platforms, where the actual hardware characteristics are hidden from both the end user and the tools that mediate execution, such as a compiler, a JIT and a runtime system, are used more and more, for example, performing large scale computation in cloud and cluster. Even worse, in such environments, a single computation may use a collection of processors with dissimilar characteristics. Ignorance of the performance-critical parameters of the underlying system makes it difficult to improve performance by optimizing the code or adjusting runtime-system behaviors; it also makes application performance harder to understand. To address this problem, we have developed a suite of portable tools that can efficiently derive many of the parameters of processor memory hierarchies, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
