PowerFusion: A Tensor Compiler with Explicit Data Movement Description   and Instruction-level Graph IR

Zixuan Ma; Haojie Wang; Jingze Xing; Liyan Zheng; Chen Zhang; Huanqi; Cao; Kezhao Huang; Shizhi Tang; Penghan Wang; Jidong Zhai

arXiv:2307.04995·cs.LG·July 12, 2023·1 cites

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Zixuan Ma, Haojie Wang, Jingze Xing, Liyan Zheng, Chen Zhang, Huanqi, Cao, Kezhao Huang, Shizhi Tang, Penghan Wang, Jidong Zhai

PDF

Open Access

TL;DR

PowerFusion introduces a tensor compiler that explicitly models data movement and computation, enabling more memory-efficient code generation for DNNs across various hardware accelerators.

Contribution

It proposes GIR, an IR that includes data movement primitives, and a holistic optimization approach for memory-intensive operators in tensor compilation.

Findings

01

Achieves up to 16.91x speedup on MLU

02

Outperforms existing frameworks on GPU and MLU

03

Demonstrates effective memory and computation optimization

Abstract

Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Advanced Neural Network Applications

MethodsFocus