Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores
Junkyeong Choi, Hyucksung Kwon, Woongkyu Lee, Jungwook Choi, Jieun, Lim

TL;DR
This paper introduces an automatic scheduling method for reduced-precision convolution on Tensor Cores, leveraging a learning-based search to optimize performance despite data reuse challenges.
Contribution
It presents a novel search algorithm that learns from distinctive candidates to optimize reduced-precision MMA scheduling for convolution operations.
Findings
Achieves substantial speedup over existing methods.
Reduces search time for optimal scheduling.
Improves data reuse in reduced-precision MMA operations.
Abstract
Convolution is one of the fundamental operations of deep neural networks with demanding matrix computation. In a graphic processing unit (GPU), Tensor Core is a specialized matrix processing hardware equipped with reduced-precision matrix-multiply-accumulate (MMA) instructions to increase throughput. However, it is challenging to achieve optimal performance since the best scheduling of MMA instructions varies for different convolution sizes. In particular, reduced-precision MMA requires many elements grouped as a matrix operand, seriously limiting data reuse and imposing packing and layout overhead on the schedule. This work proposes an automatic scheduling method of reduced-precision MMA for convolution operation. In this method, we devise a search space that explores the thread tile and warp sizes to increase the data reuse despite a large matrix operand of reduced-precision MMA. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Advanced Neural Network Applications
MethodsConvolution
