TL;DR
This paper introduces a highly accurate throughput prediction model for Intel microarchitectures, significantly outperforming existing models by leveraging detailed pipeline simulation and microarchitectural insights.
Contribution
The authors develop a simple baseline model and a detailed simulation-based predictor that achieves near 1% accuracy across multiple Intel microarchitectures, surpassing prior models.
Findings
Prediction accuracy within 1% of measurements
Simple baseline model is competitive with state-of-the-art
Microarchitectural details are crucial for accuracy
Abstract
Performance models that statically predict the steady-state throughput of basic blocks on particular microarchitectures, such as IACA, Ithemal, llvm-mca, OSACA, or CQA, can guide optimizing compilers and aid manual software optimization. However, their utility heavily depends on the accuracy of their predictions. The average error of existing models compared to measurements on the actual hardware has been shown to lie between 9% and 36%. But how good is this? To answer this question, we propose an extremely simple analytical throughput model that may serve as a baseline. Surprisingly, this model is already competitive with the state of the art, indicating that there is significant potential for improvement. To explore this potential, we develop a simulation-based throughput predictor. To this end, we propose a detailed parametric pipeline model that supports all Intel Core…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
