TLP: A Deep Learning-based Cost Model for Tensor Program Tuning
Yi Zhai, Yu Zhang, Shuo Liu, Xiaomeng Chu, Jie Peng, Jianmin Ji,, Yanyong Zhang

TL;DR
This paper introduces TLP, a deep learning-based cost model that uses tensor language processing to improve tensor program tuning, achieving significant speed-ups and better cross-hardware performance.
Contribution
It proposes TLP and MTL-TLP, novel models that extract features from schedule primitives and treat latency prediction as an NLP regression task, addressing feature extraction and cross-hardware issues.
Findings
TLP speeds up search by 9.1X on CPU and 3.0X on GPU.
MTL-TLP achieves 4.7X and 2.9X speed-up with only 7% hardware data.
Models outperform state-of-the-art in tensor program tuning.
Abstract
Tensor program tuning is a non-convex objective optimization problem, to which search-based approaches have proven to be effective. At the core of the search-based approaches lies the design of the cost model. Though deep learning-based cost models perform significantly better than other methods, they still fall short and suffer from the following problems. First, their feature extraction heavily relies on expert-level domain knowledge in hardware architectures. Even so, the extracted features are often unsatisfactory and require separate considerations for CPUs and GPUs. Second, a cost model trained on one hardware platform usually performs poorly on another, a problem we call cross-hardware unavailability. In order to address these problems, we propose TLP and MTLTLP. TLP is a deep learning-based cost model that facilitates tensor program tuning. Instead of extracting features from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Tensor decomposition and applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
