Low-memory GEMM-based convolution algorithms for deep neural networks

Andrew Anderson; Aravind Vasudevan; Cormac Keane; David Gregg

arXiv:1709.03395·cs.CV·September 12, 2017·44 cites

Low-memory GEMM-based convolution algorithms for deep neural networks

Andrew Anderson, Aravind Vasudevan, Cormac Keane, David Gregg

PDF

Open Access

TL;DR

This paper introduces two novel GEMM-based convolution algorithms that significantly reduce memory usage in deep neural networks, maintaining speed and improving data locality for multi-core systems, especially beneficial for embedded devices.

Contribution

The paper presents two new low-memory convolution algorithms based on GEMM that require less auxiliary space and outperform existing methods in speed and scalability.

Findings

01

Algorithms require only O(MHW) and O(KW) space.

02

Performance matches or exceeds patch-building approaches.

03

Algorithms excel in multi-core environments due to better data locality.

Abstract

Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as general matrix multiplication (GEMM). However, as we demonstrate in this paper, there are a great many different ways to express DNN convolution operations using GEMM. Although different approaches all perform the same number of operations, the size of temporary data structures differs significantly. Convolution of an input matrix with dimensions $C \times H \times W$ , requires $O (K^{2} C H W)$ additional space using the classical im2col approach. More recently memory-efficient approaches requiring just $O (K C H W)$ auxiliary space have been proposed. We present two novel GEMM-based algorithms that require just $O (M H W)$ and $O (K W)$ additional space respectively,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques

MethodsConvolution