LoopTune: Optimizing Tensor Computations with Reinforcement Learning

Dejan Grubisic; Bram Wasti; Chris Cummins; John Mellor-Crummey,; Aleksandar Zlateski

arXiv:2309.01825·cs.LG·November 9, 2023

LoopTune: Optimizing Tensor Computations with Reinforcement Learning

Dejan Grubisic, Bram Wasti, Chris Cummins, John Mellor-Crummey,, Aleksandar Zlateski

PDF

Open Access

TL;DR

LoopTune employs reinforcement learning to optimize tensor computations on CPUs, achieving significant speedups over existing auto-tuners and compiler frameworks with rapid tuning times.

Contribution

It introduces a novel RL-based compiler framework that optimizes tensor traversal and hardware-specific code generation for deep learning models.

Findings

01

LoopTune speeds up LoopNest by 3.2x.

02

Generates code 10x faster than TVM.

03

Tunes code in seconds.

Abstract

Advanced compiler technology is crucial for enabling machine learning applications to run on novel hardware, but traditional compilers fail to deliver performance, popular auto-tuners have long search times and expert-optimized libraries introduce unsustainable costs. To address this, we developed LoopTune, a deep reinforcement learning compiler that optimizes tensor computations in deep learning models for the CPU. LoopTune optimizes tensor traversal order while using the ultra-fast lightweight code generator LoopNest to perform hardware-specific optimizations. With a novel graph-based representation and action space, LoopTune speeds up LoopNest by 3.2x, generating an order of magnitude faster code than TVM, 2.8x faster than MetaSchedule, and 1.08x faster than AutoTVM, consistently performing at the level of the hand-tuned library Numpy. Moreover, LoopTune tunes code in order of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Advanced Neural Network Applications

MethodsLib · fail