Time-Based Roofline for Deep Learning Performance Analysis
Yunsong Wang, Charlene Yang, Steven Farrell, Yan Zhang, Thorsten, Kurth, Samuel Williams

TL;DR
This paper introduces a time-based Roofline model tailored for deep learning, enabling systematic performance analysis and optimization of compute-intensive kernels like convolution and LSTM.
Contribution
It extends the traditional Roofline model by incorporating runtime factors, providing a new systematic tool for deep learning performance analysis.
Findings
The model effectively analyzes deep learning kernel performance.
Insights into the impact of cache locality and auto-tuning on performance.
Identification of optimization opportunities for deep learning workloads.
Abstract
Deep learning applications are usually very compute-intensive and require a long run time for training and inference. This has been tackled by researchers from both hardware and software sides, and in this paper, we propose a Roofline-based approach to performance analysis to facilitate the optimization of these applications. This approach is an extension of the Roofline model widely used in traditional high-performance computing applications, and it incorporates both compute/bandwidth complexity and run time in its formulae to provide insights into deep learning-specific characteristics. We take two sets of representative kernels, 2D convolution and long short-term memory, to validate and demonstrate the use of this new approach, and investigate how arithmetic intensity, cache locality, auto-tuning, kernel launch overhead, and Tensor Core usage can affect performance. Compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Advanced Neural Network Applications
MethodsConvolution
