Low-memory GEMM-based convolution algorithms for deep neural networks
Andrew Anderson, Aravind Vasudevan, Cormac Keane, David Gregg

TL;DR
This paper introduces two novel GEMM-based convolution algorithms that significantly reduce memory usage in deep neural networks, maintaining speed and improving data locality for multi-core systems, especially beneficial for embedded devices.
Contribution
The paper presents two new low-memory convolution algorithms based on GEMM that require less auxiliary space and outperform existing methods in speed and scalability.
Findings
Algorithms require only O(MHW) and O(KW) space.
Performance matches or exceeds patch-building approaches.
Algorithms excel in multi-core environments due to better data locality.
Abstract
Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as general matrix multiplication (GEMM). However, as we demonstrate in this paper, there are a great many different ways to express DNN convolution operations using GEMM. Although different approaches all perform the same number of operations, the size of temporary data structures differs significantly. Convolution of an input matrix with dimensions , requires additional space using the classical im2col approach. More recently memory-efficient approaches requiring just auxiliary space have been proposed. We present two novel GEMM-based algorithms that require just and additional space respectively,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques
MethodsConvolution
