TL;DR
This paper extends the OSACA static analysis tool to support ARM architectures and critical path prediction, enabling cross-architecture performance modeling of loop kernels on various micro-architectures.
Contribution
The authors significantly enhanced OSACA to include ARM support and critical path analysis, improving its capability for accurate, architecture-agnostic performance predictions.
Findings
Predictions closely match actual runtime measurements.
Extended OSACA supports multiple architectures including x86 and ARM.
Critical path analysis improves understanding of performance bottlenecks.
Abstract
Useful models of loop kernel runtimes on out-of-order architectures require an analysis of the in-core performance behavior of instructions and their dependencies. While an instruction throughput prediction sets a lower bound to the kernel runtime, the critical path defines an upper bound. Such predictions are an essential part of analytic (i.e., white-box) performance models like the Roofline and Execution-Cache-Memory (ECM) models. They enable a better understanding of the performance-relevant interactions between hardware architecture and loop code. The Open Source Architecture Code Analyzer (OSACA) is a static analysis tool for predicting the execution time of sequential loops. It previously supported only x86 (Intel and AMD) architectures and simple, optimistic full-throughput execution. We have heavily extended OSACA to support ARM instructions and critical path prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
