ALT: Boosting Deep Learning Performance by Breaking the Wall between   Graph and Operator Level Optimizations

Zhiying Xu; Jiafan Xu; Hongding Peng; Wei Wang; Xiaoliang Wang; Haoran; Wan; Haipeng Dai; Yixu Xu; Hao Cheng; Kun Wang; Guihai Chen

arXiv:2210.12415·cs.LG·November 1, 2022

ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations

Zhiying Xu, Jiafan Xu, Hongding Peng, Wei Wang, Xiaoliang Wang, Haoran, Wan, Haipeng Dai, Yixu Xu, Hao Cheng, Kun Wang, Guihai Chen

PDF

Open Access

TL;DR

ALT is a deep learning compiler that unifies graph and operator-level optimizations, leading to significant performance improvements in inference speed on heterogeneous hardware.

Contribution

It introduces a joint optimization framework combining graph and operator tuning, which was previously separated in existing compilers.

Findings

01

Achieves 1.5x speedup on single operators

02

Attains 1.4x faster end-to-end inference

03

Outperforms state-of-the-art compilers like Ansor

Abstract

Deep learning models rely on highly optimized tensor libraries for efficient inference on heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors and then optimize loops of operators. However, such unidirectional and one-off workflow strictly separates graph-level optimization and operator-level optimization into different system layers, missing opportunities for unified tuning. This paper proposes ALT, a compiler that performs joint graph- and operator-level optimizations for deep models. ALT provides a generic transformation module to manipulate layouts and loops with easy-to-use primitive functions. ALT further integrates an auto-tuning module that jointly optimizes graph-level data layouts and operator-level loops while guaranteeing efficiency. Experimental results show that ALT significantly outperforms state-of-the-art compilers (e.g., Ansor) in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices