Accelerating Machine Learning Primitives on Commodity Hardware
Roman Snytsar

TL;DR
This paper investigates the Sliding Window convolution technique as a more efficient alternative to GEMM-based convolution in deep neural networks, showing significant speedups and potential for low-power, low-memory hardware deployment.
Contribution
It presents an extensive study demonstrating that Sliding Window convolution can outperform GEMM-based methods on CPUs and accelerators, promoting AI on resource-constrained devices.
Findings
Sliding Window convolution reduces memory usage compared to GEMM.
The technique achieves significant speedups in 2-D convolution.
Potential for wider adoption in low-power AI hardware.
Abstract
Sliding Window Sum algorithms have been successfully used for training and inference of Deep Neural Networks. We have shown before how both pooling and convolution 1-D primitives could be expressed as sliding sums and evaluated by the compute kernels with a shared structure. In this paper, we present an extensive study of the Sliding Window convolution technique as a more efficient alternative to the commonly used General Matrix Multiplication (GEMM) based convolution in Deep Neural Networks (DNNs). The Sliding Window technique addresses the memory bloating problem and demonstrates a significant speedup in 2-D convolution. We explore the performance of this technique on a range of implementations, including custom kernels for specific filter sizes. Our results suggest that the Sliding Window computation kernels can outperform GEMM-based convolution on a CPU and even on dedicated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices · Advanced Neural Network Applications
MethodsConvolution
