Im2win: An Efficient Convolution Paradigm on GPU
Shuai Lu, Jun Chu, Luanzheng Guo, Xu T. Liu

TL;DR
This paper introduces im2win, a GPU convolution paradigm that reduces memory usage and improves performance by ensuring continuous memory access, outperforming existing methods in deep neural network computations.
Contribution
The paper proposes the im2win convolution method on GPU, combining memory efficiency with high performance, and demonstrates its superiority over existing approaches through extensive benchmarks.
Findings
Reduces memory footprint by up to 32.8%.
Achieves up to 3.5× TFLOPS compared to cuBLAS.
Outperforms cuDNN and direct convolution in speed and memory efficiency.
Abstract
Convolution is the most time-consuming operation in deep neural network operations, so its performance is critical to the overall performance of the neural network. The commonly used methods for convolution on GPU include the general matrix multiplication (GEMM)-based convolution and the direct convolution. GEMM-based convolution relies on the im2col algorithm, which results in a large memory footprint and reduced performance. Direct convolution does not have the large memory footprint problem, but the performance is not on par with GEMM-based approach because of the discontinuous memory access. This paper proposes a window-order-based convolution paradigm on GPU, called im2win, which not only reduces memory footprint but also offers continuous memory accesses, resulting in improved performance. Furthermore, we apply a range of optimization techniques on the convolution CUDA kernel,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
MethodsConvolution
