High Performance Zero-Memory Overhead Direct Convolutions

Jiyuan Zhang; Franz Franchetti; Tze Meng Low

arXiv:1809.10170·cs.LG·September 28, 2018·32 cites

High Performance Zero-Memory Overhead Direct Convolutions

Jiyuan Zhang, Franz Franchetti, Tze Meng Low

PDF

Open Access

TL;DR

This paper presents a direct convolution method that eliminates memory overhead and significantly improves performance, especially on embedded devices, outperforming existing routines by up to 400% and scaling better with multithreading.

Contribution

The paper introduces a correctly implemented direct convolution approach that removes memory overhead and achieves superior performance and scalability compared to traditional high-performance routines.

Findings

01

Performance improved by 10% to 400% over existing methods

02

Memory overhead is eliminated with direct convolution

03

Better scaling with increased threading

Abstract

The computation of convolution layers in deep neural networks typically rely on high performance routines that trade space for time by using additional memory (either for packing purposes or required as part of the algorithm) to improve performance. The problems with such an approach are two-fold. First, these routines incur additional memory overhead which reduces the overall size of the network that can fit on embedded devices with limited memory capacity. Second, these high performance routines were not optimized for performing convolution, which means that the performance obtained is usually less than conventionally expected. In this paper, we demonstrate that direct convolution, when implemented correctly, eliminates all memory overhead, and yields performance that is between 10% to 400% times better than existing high performance implementations of convolution layers on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing