Applying the Roofline model for Deep Learning performance optimizations

Jacek Czaja; Michal Gallus; Joanna Wozna; Adam Grygielski; Luo Tao

arXiv:2009.11224·cs.DC·September 24, 2020·1 cites

Applying the Roofline model for Deep Learning performance optimizations

Jacek Czaja, Michal Gallus, Joanna Wozna, Adam Grygielski, Luo Tao

PDF

Open Access

TL;DR

This paper introduces an automated methodology for constructing Roofline performance models tailored for NUMA architectures, exemplified with Intel Xeon, and evaluates optimized deep learning primitives from Intel oneDNN.

Contribution

The paper presents a novel automated approach for Roofline modeling on NUMA systems and assesses the performance of optimized deep learning primitives.

Findings

01

Automated Roofline models effectively represent NUMA system performance.

02

Optimized deep learning primitives show significant performance improvements.

03

The methodology facilitates performance analysis and optimization for complex architectures.

Abstract

In this paper We present a methodology for creating Roofline models automatically for Non-Unified Memory Access (NUMA) using Intel Xeon as an example. Finally, we present an evaluation of highly efficient deep learning primitives as implemented in the Intel oneDNN Library.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Advanced Neural Network Applications