Learning from distinctive candidates to optimize reduced-precision   convolution program on tensor cores

Junkyeong Choi; Hyucksung Kwon; Woongkyu Lee; Jungwook Choi; Jieun; Lim

arXiv:2202.06819·cs.LG·February 25, 2022

Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores

Junkyeong Choi, Hyucksung Kwon, Woongkyu Lee, Jungwook Choi, Jieun, Lim

PDF

Open Access

TL;DR

This paper introduces an automatic scheduling method for reduced-precision convolution on Tensor Cores, leveraging a learning-based search to optimize performance despite data reuse challenges.

Contribution

It presents a novel search algorithm that learns from distinctive candidates to optimize reduced-precision MMA scheduling for convolution operations.

Findings

01

Achieves substantial speedup over existing methods.

02

Reduces search time for optimal scheduling.

03

Improves data reuse in reduced-precision MMA operations.

Abstract

Convolution is one of the fundamental operations of deep neural networks with demanding matrix computation. In a graphic processing unit (GPU), Tensor Core is a specialized matrix processing hardware equipped with reduced-precision matrix-multiply-accumulate (MMA) instructions to increase throughput. However, it is challenging to achieve optimal performance since the best scheduling of MMA instructions varies for different convolution sizes. In particular, reduced-precision MMA requires many elements grouped as a matrix operand, seriously limiting data reuse and imposing packing and layout overhead on the schedule. This work proposes an automatic scheduling method of reduced-precision MMA for convolution operation. In this method, we devise a search space that explores the thread tile and warp sizes to increase the data reuse despite a large matrix operand of reduced-precision MMA. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Advanced Neural Network Applications

MethodsConvolution