I/O Lower Bounds for Auto-tuning of Convolutions in CNNs
Xiaoyang Zhang, Junmin Xiao, Guangming Tan

TL;DR
This paper develops I/O lower bounds for CNN convolution algorithms, designs near-optimal dataflow strategies, and employs auto-tuning to significantly improve GPU performance over existing methods like cuDNN and TVM.
Contribution
It introduces a comprehensive I/O lower bound theory for CNN convolutions, and applies it to optimize dataflow and auto-tuning strategies for direct and Winograd algorithms on GPUs.
Findings
Achieves 3.32x speedup over cuDNN
Faster auto-tuning than TVM
Higher performance than TVM's optimal configurations
Abstract
Convolution is the most time-consuming part in the computation of convolutional neural networks (CNNs), which have achieved great successes in numerous applications. Due to the complex data dependency and the increase in the amount of model samples, the convolution suffers from high overhead on data movement (i.e., memory access). This work provides comprehensive analysis and methodologies to minimize the communication for the convolution in CNNs. With an in-depth analysis of the recent I/O complexity theory under the red-blue game model, we develop a general I/O lower bound theory for a composite algorithm which consists of several different sub-computations. Based on the proposed theory, we establish the data movement lower bound results of two representative convolution algorithms in CNNs, namely the direct convolution and Winograd algorithm. Next, derived from I/O lower bound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
MethodsConvolution
