FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads
Guoping Long, Jun Yang, Wei Lin

TL;DR
FusionStitching is a framework that significantly improves GPU performance for memory-intensive deep learning computations by fusing various operations into large kernels, leading to substantial speedups over existing methods.
Contribution
It introduces a novel fusion framework for memory-intensive DL ops, formulates fusion planning as ILP, and leverages hardware resources for efficient GPU kernel mapping.
Findings
Up to 5.7x speedup over TensorFlow baseline
Achieves 1.25x to 1.85x speedup over state-of-the-art methods
Average speedup of 1.4x across benchmarks and models
Abstract
Performance optimization is the art of continuous seeking a harmonious mapping between the application domain and hardware. Recent years have witnessed a surge of deep learning (DL) applications in industry. Conventional wisdom for optimizing such workloads mainly focus on compute intensive ops (GEMM, Convolution, etc). Yet we show in this work, that the performance of memory intensive computations is vital to E2E performance in practical DL models. We propose \emph{FusionStitching}, a optimization framework capable of fusing memory intensive \emph{elementwise}, \emph{reduction} and fine grained \emph{GEMM/Batched-GEMM} ops, with or without data dependences, into large computation units, then mapping and transforming them into efficient GPU kernels. We formulate the fusion plan optimization as an integer linear programming (ILP) problem, and propose a set of empirical heuristics to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques
