Optimizing Memory-Access Patterns for Deep Learning Accelerators

Hongbin Zheng; Sejong Oh; Huiqing Wang; Preston Briggs; Jiading Gai,; Animesh Jain; Yizhi Liu; Rich Heaton; Randy Huang; Yida Wang

arXiv:2002.12798·cs.PF·March 2, 2020·6 cites

Optimizing Memory-Access Patterns for Deep Learning Accelerators

Hongbin Zheng, Sejong Oh, Huiqing Wang, Preston Briggs, Jiading Gai,, Animesh Jain, Yizhi Liu, Rich Heaton, Randy Huang, Yida Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a systematic polyhedral model-based method to optimize memory-access patterns in deep learning accelerators, significantly improving performance by reducing memory accesses.

Contribution

It presents a novel approach that analyzes all operators of a DL model collectively to minimize memory accesses using the polyhedral model.

Findings

01

Reduces memory access overhead in neural network models

02

Improves performance on AWS Inferentia chip

03

Demonstrates substantial efficiency gains

Abstract

Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

puzzlef/graph-csr-openmp
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Adversarial Robustness in Machine Learning