cuConv: A CUDA Implementation of Convolution for CNN Inference

Marc Jord\`a; Pedro Valero-Lara; Antonio J. Pe\~na

arXiv:2103.16234·cs.DC·October 28, 2024

cuConv: A CUDA Implementation of Convolution for CNN Inference

Marc Jord\`a, Pedro Valero-Lara, Antonio J. Pe\~na

PDF

TL;DR

This paper introduces cuConv, a CUDA-based convolution implementation for CNN inference that improves efficiency by optimizing memory access patterns, achieving up to 2.29x speedup over cuDNN in various configurations.

Contribution

The paper presents a novel GPU convolution implementation that enhances performance without data transformations, outperforming existing cuDNN methods in key CNN configurations.

Findings

01

Achieves up to 2.29x speedup over cuDNN

02

Optimizes coalesced memory accesses for CNN inference

03

Effective across multiple common CNN convolution configurations

Abstract

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in production for this purpose. State-of-the-art implementations, however, present a lack of efficiency for some commonly used network configurations. In this paper we propose a GPU-based implementation of the convolution operation for CNN inference that favors coalesced accesses, without requiring prior data transformations. Our experiments demonstrate that our proposal yields notable performance improvements in a range of common CNN forward propagation convolution configurations, with speedups of up to 2.29x with respect to the best implementation of convolution in cuDNN, hence covering a relevant region in currently existing approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution