Accelerating Machine Learning Primitives on Commodity Hardware

Roman Snytsar

arXiv:2310.05218·cs.LG·October 10, 2023

Accelerating Machine Learning Primitives on Commodity Hardware

Roman Snytsar

PDF

Open Access

TL;DR

This paper investigates the Sliding Window convolution technique as a more efficient alternative to GEMM-based convolution in deep neural networks, showing significant speedups and potential for low-power, low-memory hardware deployment.

Contribution

It presents an extensive study demonstrating that Sliding Window convolution can outperform GEMM-based methods on CPUs and accelerators, promoting AI on resource-constrained devices.

Findings

01

Sliding Window convolution reduces memory usage compared to GEMM.

02

The technique achieves significant speedups in 2-D convolution.

03

Potential for wider adoption in low-power AI hardware.

Abstract

Sliding Window Sum algorithms have been successfully used for training and inference of Deep Neural Networks. We have shown before how both pooling and convolution 1-D primitives could be expressed as sliding sums and evaluated by the compute kernels with a shared structure. In this paper, we present an extensive study of the Sliding Window convolution technique as a more efficient alternative to the commonly used General Matrix Multiplication (GEMM) based convolution in Deep Neural Networks (DNNs). The Sliding Window technique addresses the memory bloating problem and demonstrates a significant speedup in 2-D convolution. We explore the performance of this technique on a range of implementations, including custom kernels for specific filter sizes. Our results suggest that the Sliding Window computation kernels can outperform GEMM-based convolution on a CPU and even on dedicated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices · Advanced Neural Network Applications

MethodsConvolution