FusionStitching: Boosting Execution Efficiency of Memory Intensive   Computations for DL Workloads

Guoping Long; Jun Yang; Wei Lin

arXiv:1911.11576·cs.DC·November 27, 2019

FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads

Guoping Long, Jun Yang, Wei Lin

PDF

Open Access

TL;DR

FusionStitching is a framework that significantly improves GPU performance for memory-intensive deep learning computations by fusing various operations into large kernels, leading to substantial speedups over existing methods.

Contribution

It introduces a novel fusion framework for memory-intensive DL ops, formulates fusion planning as ILP, and leverages hardware resources for efficient GPU kernel mapping.

Findings

01

Up to 5.7x speedup over TensorFlow baseline

02

Achieves 1.25x to 1.85x speedup over state-of-the-art methods

03

Average speedup of 1.4x across benchmarks and models

Abstract

Performance optimization is the art of continuous seeking a harmonious mapping between the application domain and hardware. Recent years have witnessed a surge of deep learning (DL) applications in industry. Conventional wisdom for optimizing such workloads mainly focus on compute intensive ops (GEMM, Convolution, etc). Yet we show in this work, that the performance of memory intensive computations is vital to E2E performance in practical DL models. We propose \emph{FusionStitching}, a optimization framework capable of fusing memory intensive \emph{elementwise}, \emph{reduction} and fine grained \emph{GEMM/Batched-GEMM} ops, with or without data dependences, into large computation units, then mapping and transforming them into efficient GPU kernels. We formulate the fusion plan optimization as an integer linear programming (ILP) problem, and propose a set of empirical heuristics to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques