TL;DR
Korch is a tensor program optimizer that decomposes tensor operators into primitives and uses linear programming to find optimal kernel orchestration, significantly improving GPU execution efficiency for deep neural networks.
Contribution
Korch introduces a novel approach combining operator fission and constrained optimization to discover optimal kernel orchestration strategies for tensor programs.
Findings
Outperforms existing optimizers by up to 1.7x on V100 GPUs
Achieves up to 1.6x speedup on A100 GPUs
Effective across a variety of DNN models
Abstract
Kernel orchestration is the task of mapping the computation defined in different operators of a deep neural network (DNN) to the execution of GPU kernels on modern hardware platforms. Prior approaches optimize kernel orchestration by greedily applying operator fusion, which fuses the computation of multiple operators into a single kernel, and miss a variety of optimization opportunities in kernel orchestration. This paper presents Korch, a tensor program optimizer that discovers optimal kernel orchestration strategies for tensor programs. Instead of directly fusing operators, Korch first applies operator fission to decompose tensor operators into a small set of basic tensor algebra primitives. This decomposition enables a diversity of fine-grained, inter-operator optimizations. Next, Korch optimizes kernel orchestration by formalizing it as a constrained optimization problem,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
