TL;DR
This paper introduces OSACA, a static analysis tool that predicts instruction stream throughput on Intel and AMD microarchitectures, aiding performance modeling and understanding hardware-code interactions.
Contribution
The paper presents OSACA, a novel open-source static analysis tool for predicting in-core performance of x86 instruction streams on modern microarchitectures.
Findings
OSACA accurately predicts execution times for benchmark kernels.
Models built for Skylake and Zen architectures match measured performance.
The approach can be extended to future architectures.
Abstract
An accurate prediction of scheduling and execution of instruction streams is a necessary prerequisite for predicting the in-core performance behavior of throughput-bound loop kernels on out-of-order processor architectures. Such predictions are an indispensable component of analytical performance models, such as the Roofline and the Execution-Cache-Memory (ECM) model, and allow a deep understanding of the performance-relevant interactions between hardware architecture and loop code. We present the Open Source Architecture Code Analyzer (OSACA), a static analysis tool for predicting the execution time of sequential loops comprising x86 instructions under the assumption of an infinite first-level cache and perfect out-of-order scheduling. We show the process of building a machine model from available documentation and semi-automatic benchmarking, and carry it out for the latest Intel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
