How to keep pushing ML accelerator performance? Know your rooflines!
Marian Verhelst, Luca Benini, Naveen Verma

TL;DR
This paper surveys trends in ML hardware accelerators, introduces an enhanced roofline model to analyze their performance, and provides insights for optimizing efficiency and throughput in ML systems.
Contribution
It presents an improved roofline framework tailored for ML accelerators, unifying compute and memory interactions to guide performance optimization.
Findings
Enhanced roofline model effectively characterizes ML accelerator performance
Examples demonstrate how to identify bottlenecks and optimize designs
Framework reveals open research opportunities for further improvements
Abstract
The rapidly growing importance of Machine Learning (ML) applications, coupled with their ever-increasing model size and inference energy footprint, has created a strong need for specialized ML hardware architectures. Numerous ML accelerators have been explored and implemented, primarily to increase task-level throughput per unit area and reduce task-level energy consumption. This paper surveys key trends toward these objectives for more efficient ML accelerators and provides a unifying framework to understand how compute and memory technologies/architectures interact to enhance system-level efficiency and performance. To achieve this, the paper introduces an enhanced version of the roofline model and applies it to ML accelerators as an effective tool for understanding where various execution regimes fall within roofline bounds and how to maximize performance and efficiency under the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
