High Performance and Portable Convolution Operators for ARM-based Multicore Processors
Pablo San Juan, Adri\'an Castell\'o, Manuel F. Dolz, Pedro, Alonso-Jord\'a, Enrique S. Quintana-Ort\'i

TL;DR
This paper introduces a portable, high-performance convolution algorithm for ARM multicore processors that eliminates intermediate memory and IM2COL transform costs by leveraging the BLIS GEMM kernel structure.
Contribution
It presents a novel convolution method that avoids intermediate memory and reduces computation time, enhancing performance and portability on ARM multicore systems.
Findings
Achieves high performance comparable to existing methods
Reduces memory usage significantly
Maintains portability across ARM architectures
Abstract
The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the \imcol transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the GEMM kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the IM2COL transform; and 2) the time to perform the IM2COL transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the GEMM kernel that avoids the use of the intermediate memory by taking advantage of the BLIS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Digital Filter Design and Implementation
