MCFuser: High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators

Zheng Zhang; Donglin Yang; Xiaobo Zhou; Dazhao Cheng

arXiv:2506.22169·cs.DC·June 30, 2025

MCFuser: High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators

Zheng Zhang, Donglin Yang, Xiaobo Zhou, Dazhao Cheng

PDF

TL;DR

MCFuser is a framework that efficiently generates high-performance fused kernels for memory-bound compute-intensive operators, significantly improving GPU performance and reducing tuning time.

Contribution

It introduces a novel approach combining high-level tiling, DAG analysis, and analytical modeling to optimize fusion of compute-intensive operators, overcoming existing limitations.

Findings

01

Achieves up to 5.9x speedup over leading compilers.

02

Reduces tuning time by over 70-fold.

03

Demonstrates superior performance on NVIDIA GPUs.

Abstract

Operator fusion, a key technique to improve data locality and alleviate GPU memory bandwidth pressure, often fails to extend to the fusion of multiple compute-intensive operators due to saturated computation throughput. However, the dynamicity of tensor dimension sizes could potentially lead to these operators becoming memory-bound, necessitating the generation of fused kernels, a task hindered by limited search spaces for fusion strategies, redundant memory access, and prolonged tuning time, leading to sub-optimal performance and inefficient deployment. We introduce MCFuser, a pioneering framework designed to overcome these obstacles by generating high-performance fused kernels for what we define as memory-bound compute-intensive (MBCI) operator chains. Leveraging high-level tiling expressions to delineate a comprehensive search space, coupled with Directed Acyclic Graph (DAG)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.