TLP: A Deep Learning-based Cost Model for Tensor Program Tuning

Yi Zhai; Yu Zhang; Shuo Liu; Xiaomeng Chu; Jie Peng; Jianmin Ji,; Yanyong Zhang

arXiv:2211.03578·cs.LG·November 23, 2022

TLP: A Deep Learning-based Cost Model for Tensor Program Tuning

Yi Zhai, Yu Zhang, Shuo Liu, Xiaomeng Chu, Jie Peng, Jianmin Ji,, Yanyong Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces TLP, a deep learning-based cost model that uses tensor language processing to improve tensor program tuning, achieving significant speed-ups and better cross-hardware performance.

Contribution

It proposes TLP and MTL-TLP, novel models that extract features from schedule primitives and treat latency prediction as an NLP regression task, addressing feature extraction and cross-hardware issues.

Findings

01

TLP speeds up search by 9.1X on CPU and 3.0X on GPU.

02

MTL-TLP achieves 4.7X and 2.9X speed-up with only 7% hardware data.

03

Models outperform state-of-the-art in tensor program tuning.

Abstract

Tensor program tuning is a non-convex objective optimization problem, to which search-based approaches have proven to be effective. At the core of the search-based approaches lies the design of the cost model. Though deep learning-based cost models perform significantly better than other methods, they still fall short and suffer from the following problems. First, their feature extraction heavily relies on expert-level domain knowledge in hardware architectures. Even so, the extracted features are often unsatisfactory and require separate considerations for CPUs and GPUs. Second, a cost model trained on one hardware platform usually performs poorly on another, a problem we call cross-hardware unavailability. In order to address these problems, we propose TLP and MTLTLP. TLP is a deep learning-based cost model that facilitates tensor program tuning. Instead of extracting features from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhaiyi000/tlp
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Tensor decomposition and applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings