Applying the Roofline model for Deep Learning performance optimizations
Jacek Czaja, Michal Gallus, Joanna Wozna, Adam Grygielski, Luo Tao

TL;DR
This paper introduces an automated methodology for constructing Roofline performance models tailored for NUMA architectures, exemplified with Intel Xeon, and evaluates optimized deep learning primitives from Intel oneDNN.
Contribution
The paper presents a novel automated approach for Roofline modeling on NUMA systems and assesses the performance of optimized deep learning primitives.
Findings
Automated Roofline models effectively represent NUMA system performance.
Optimized deep learning primitives show significant performance improvements.
The methodology facilitates performance analysis and optimization for complex architectures.
Abstract
In this paper We present a methodology for creating Roofline models automatically for Non-Unified Memory Access (NUMA) using Intel Xeon as an example. Finally, we present an evaluation of highly efficient deep learning primitives as implemented in the Intel oneDNN Library.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Advanced Neural Network Applications
