Optimal Kernel Orchestration for Tensor Programs with Korch

Muyan Hu; Ashwin Venkatram; Shreyashri Biswas; Balamurugan Marimuthu,; Bohan Hou; Gabriele Oliaro; Haojie Wang; Liyan Zheng; Xupeng Miao; Jidong; Zhai

arXiv:2406.09465·cs.DS·June 17, 2024

Optimal Kernel Orchestration for Tensor Programs with Korch

Muyan Hu, Ashwin Venkatram, Shreyashri Biswas, Balamurugan Marimuthu,, Bohan Hou, Gabriele Oliaro, Haojie Wang, Liyan Zheng, Xupeng Miao, Jidong, Zhai

PDF

1 Repo

TL;DR

Korch is a tensor program optimizer that decomposes tensor operators into primitives and uses linear programming to find optimal kernel orchestration, significantly improving GPU execution efficiency for deep neural networks.

Contribution

Korch introduces a novel approach combining operator fission and constrained optimization to discover optimal kernel orchestration strategies for tensor programs.

Findings

01

Outperforms existing optimizers by up to 1.7x on V100 GPUs

02

Achieves up to 1.6x speedup on A100 GPUs

03

Effective across a variety of DNN models

Abstract

Kernel orchestration is the task of mapping the computation defined in different operators of a deep neural network (DNN) to the execution of GPU kernels on modern hardware platforms. Prior approaches optimize kernel orchestration by greedily applying operator fusion, which fuses the computation of multiple operators into a single kernel, and miss a variety of optimization opportunities in kernel orchestration. This paper presents Korch, a tensor program optimizer that discovers optimal kernel orchestration strategies for tensor programs. Instead of directly fusing operators, Korch first applies operator fission to decompose tensor operators into a small set of basic tensor algebra primitives. This decomposition enables a diversity of fine-grained, inter-operator optimizations. Next, Korch optimizes kernel orchestration by formalizing it as a constrained optimization problem,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

humuyan/korch
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training