Towards a high-performance AI compiler with upstream MLIR

Renato Golin; Lorenzo Chelini; Adam Siemieniuk; Kavitha Madhu,; Niranjan Hasabnis; Hans Pabst; Evangelos Georganas; Alexander Heinecke

arXiv:2404.15204·cs.PL·April 24, 2024·1 cites

Towards a high-performance AI compiler with upstream MLIR

Renato Golin, Lorenzo Chelini, Adam Siemieniuk, Kavitha Madhu,, Niranjan Hasabnis, Hans Pabst, Evangelos Georganas, Alexander Heinecke

PDF

Open Access 1 Repo

TL;DR

This paper introduces a compilation flow using open-source passes to optimize high-level linear algebra IR for high-performance AI applications, achieving near hand-optimized performance.

Contribution

It presents a novel compilation pipeline with cache-aware tensor distribution, shape propagation, and micro-kernel lowering for efficient AI model execution.

Findings

01

Achieves over 90% performance of hand-optimized code

02

Supports TensorFlow and PyTorch IR inputs

03

Includes cache-aware tensor distribution and micro-kernel lowering

Abstract

This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction. We demonstrate this flow with a proof-of-concept MLIR project that uses input IR in Linalg-on-Tensor from TensorFlow and PyTorch, performs cache-level optimizations and lowering to micro-kernels for efficient vectorization, achieving over 90% of the performance of ninja-written equivalent programs. The contributions of this work include: (1) Packing primitives on the tensor dialect and passes for cache-aware distribution of tensors (single and multi-core) and type-aware instructions (VNNI, BFDOT, BFMMLA), including propagation of shapes across the entire function; (2) A linear algebra pipeline, including tile, fuse and bufferization strategies to get model-level IR into hardware friendly tile calls; (3) A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

plaidml/tpp-mlir
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques

MethodsLib