FTuner: A Fast Dynamic Shape Tensors Program Auto-Tuner for Deep   Learning Compilers

Pengyu Mu; Linquan Wei; Yi Liu; Rui Wang

arXiv:2407.21418·cs.LG·August 1, 2024·1 cites

FTuner: A Fast Dynamic Shape Tensors Program Auto-Tuner for Deep Learning Compilers

Pengyu Mu, Linquan Wei, Yi Liu, Rui Wang

PDF

Open Access

TL;DR

FTuner is a novel auto-tuning technique for deep learning compilers that efficiently matches dynamic tensor shapes using an abstract computational unit, significantly reducing tuning time while maintaining high performance.

Contribution

Introduces FTuner, a new method that uses the uKernel concept and hardware modeling to quickly optimize dynamic tensor shapes without extensive training or large design spaces.

Findings

01

Achieves 3% speedup over existing auto-tuners.

02

Reduces tuning time by two orders of magnitude.

03

Matches performance of vendor libraries.

Abstract

Many artificial intelligence models process input data of different lengths and resolutions, making the shape of the tensors dynamic. The performance of these models depends on the shape of the tensors, which makes it difficult to optimize the tensors before the model runs. There are two common solutions to this problem. The first is to add useless data to the input to match a pre-optimized tensor library. The second is to use small basic tensors to create a tensor that is closest in size to the input data and then tune it to minimize padding. However, this second solution can be time-consuming. This paper proposes a new technique for deep learning compilers called FTuner. Instead of using a large design space or training a cost model, we use an abstract computational unit called the uKernel to patch together small, various-sized tensors to match the shape of the input tensor. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Computational Physics and Python Applications · Tensor decomposition and applications