High Performance Zero-Memory Overhead Direct Convolutions
Jiyuan Zhang, Franz Franchetti, Tze Meng Low

TL;DR
This paper presents a direct convolution method that eliminates memory overhead and significantly improves performance, especially on embedded devices, outperforming existing routines by up to 400% and scaling better with multithreading.
Contribution
The paper introduces a correctly implemented direct convolution approach that removes memory overhead and achieves superior performance and scalability compared to traditional high-performance routines.
Findings
Performance improved by 10% to 400% over existing methods
Memory overhead is eliminated with direct convolution
Better scaling with increased threading
Abstract
The computation of convolution layers in deep neural networks typically rely on high performance routines that trade space for time by using additional memory (either for packing purposes or required as part of the algorithm) to improve performance. The problems with such an approach are two-fold. First, these routines incur additional memory overhead which reduces the overall size of the network that can fit on embedded devices with limited memory capacity. Second, these high performance routines were not optimized for performing convolution, which means that the performance obtained is usually less than conventionally expected. In this paper, we demonstrate that direct convolution, when implemented correctly, eliminates all memory overhead, and yields performance that is between 10% to 400% times better than existing high performance implementations of convolution layers on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing
