Optimally Scheduling CNN Convolutions for Efficient Memory Access
Arthur Stoutchinin, Francesco Conti, Luca Benini

TL;DR
This paper introduces an analytical model for optimizing CNN convolution schedules to minimize memory bandwidth in embedded inference engines, demonstrating practical implementation and significant bandwidth reduction.
Contribution
The paper presents a new, more accurate memory bandwidth model for CNN convolutions and designs an accelerator implementing optimal schedules for improved efficiency.
Findings
Model outperforms previous models in accuracy
Achieves up to 14x bandwidth reduction
Optimal schedules are practical to implement
Abstract
Embedded inference engines for convolutional networks must be parsimonious in memory bandwidth and buffer sizing to meet power and cost constraints. We present an analytical memory bandwidth model for loop-nest optimization targeting architectures with application managed buffers. We applied this model to optimize the CNN convolution loop-nest. We show that our model is more accurate than previously published models. Using this model we can identify non-trivial dataflow schedules that result in lowest communication bandwidth given tight local buffering constraints. We show that optimal dataflow schedules are implementable in practice and that our model is accurate with respect to a real implementation; moreover, we introduce an accelerator architecture, named Hardware Convolution Block (HWC), which implements the optimal schedules, and we show it achieves up to 14x memory bandwidth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques
MethodsConvolution
