Optimally Scheduling CNN Convolutions for Efficient Memory Access

Arthur Stoutchinin; Francesco Conti; Luca Benini

arXiv:1902.01492·cs.NE·February 6, 2019·21 cites

Optimally Scheduling CNN Convolutions for Efficient Memory Access

Arthur Stoutchinin, Francesco Conti, Luca Benini

PDF

Open Access

TL;DR

This paper introduces an analytical model for optimizing CNN convolution schedules to minimize memory bandwidth in embedded inference engines, demonstrating practical implementation and significant bandwidth reduction.

Contribution

The paper presents a new, more accurate memory bandwidth model for CNN convolutions and designs an accelerator implementing optimal schedules for improved efficiency.

Findings

01

Model outperforms previous models in accuracy

02

Achieves up to 14x bandwidth reduction

03

Optimal schedules are practical to implement

Abstract

Embedded inference engines for convolutional networks must be parsimonious in memory bandwidth and buffer sizing to meet power and cost constraints. We present an analytical memory bandwidth model for loop-nest optimization targeting architectures with application managed buffers. We applied this model to optimize the CNN convolution loop-nest. We show that our model is more accurate than previously published models. Using this model we can identify non-trivial dataflow schedules that result in lowest communication bandwidth given tight local buffering constraints. We show that optimal dataflow schedules are implementable in practice and that our model is accurate with respect to a real implementation; moreover, we introduce an accelerator architecture, named Hardware Convolution Block (HWC), which implements the optimal schedules, and we show it achieves up to 14x memory bandwidth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques

MethodsConvolution