cuConv: A CUDA Implementation of Convolution for CNN Inference
Marc Jord\`a, Pedro Valero-Lara, Antonio J. Pe\~na

TL;DR
This paper introduces cuConv, a CUDA-based convolution implementation for CNN inference that improves efficiency by optimizing memory access patterns, achieving up to 2.29x speedup over cuDNN in various configurations.
Contribution
The paper presents a novel GPU convolution implementation that enhances performance without data transformations, outperforming existing cuDNN methods in key CNN configurations.
Findings
Achieves up to 2.29x speedup over cuDNN
Optimizes coalesced memory accesses for CNN inference
Effective across multiple common CNN convolution configurations
Abstract
Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in production for this purpose. State-of-the-art implementations, however, present a lack of efficiency for some commonly used network configurations. In this paper we propose a GPU-based implementation of the convolution operation for CNN inference that favors coalesced accesses, without requiring prior data transformations. Our experiments demonstrate that our proposal yields notable performance improvements in a range of common CNN forward propagation convolution configurations, with speedups of up to 2.29x with respect to the best implementation of convolution in cuDNN, hence covering a relevant region in currently existing approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
