Blockbuster, Part 1: Block-level AI Operator Fusion

Ofer Dekel

arXiv:2505.07829·cs.LG·May 14, 2025

Blockbuster, Part 1: Block-level AI Operator Fusion

Ofer Dekel

PDF

TL;DR

Blockbuster introduces a flexible framework for AI operator fusion across various hardware architectures, utilizing a graph-based workload representation and a novel rule-based fusion algorithm that models data movement between memory tiers.

Contribution

It presents a new rule-based fusion algorithm that explicitly models memory tier data movement, enabling powerful fusion of complex AI operations.

Findings

01

Successfully rediscovered Flash Attention kernel

02

Fused LayerNorm with matrix multiplication

03

Fused multiple operations into a single mega-kernel

Abstract

Blockbuster is a framework for AI operator fusion in inference programs. The Blockbuster framework is compatible with any multiprocessor architecture that has a tiered memory hierarchy, including GPUs, multi-core CPUs, and some AI accelerator chips. It includes a graph-based representation for AI workloads, called a block program, which explicitly models how blocks of data move between the memory tiers. It also includes an operator fusion procedure, which is made up of a candidate selection algorithm and a fusion algorithm that fuses each individual candidate - this two-algorithm structure makes Blockbuster especially suitable for large AI programs. The current paper focuses on the fusion algorithm, which is a rule-based technique. While the literature is full of previous rule-based fusion algorithms, what sets our algorithm apart is its direct modeling of data movement between memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Root Mean Square Layer Normalization