Automatic Horizontal Fusion for GPU Kernels

Ao Li; Bojian Zheng; Gennady Pekhimenko; and Fan Long

arXiv:2007.01277·cs.DC·July 3, 2020·5 cites

Automatic Horizontal Fusion for GPU Kernels

Ao Li, Bojian Zheng, Gennady Pekhimenko, and Fan Long

PDF

Open Access

TL;DR

This paper introduces automatic horizontal fusion, a new GPU kernel optimization technique that enhances thread-level parallelism to improve performance, demonstrated by a tool called HFuse with significant speedups.

Contribution

It proposes a novel horizontal fusion method to complement existing kernel fusion techniques, implemented in HFuse, improving GPU kernel performance by increasing parallelism.

Findings

01

Horizontal fusion speeds up GPU kernels by 2.5%-60.8%.

02

Horizontal fusion benefits kernels with diverse resource requirements.

03

HFuse effectively automates the horizontal fusion process.

Abstract

We present automatic horizontal fusion, a novel optimization technique that complements the standard kernel fusion techniques for GPU programs. Unlike the standard fusion, whose goal is to eliminate intermediate data round trips, our horizontal fusion technique aims to increase the thread-level parallelism to hide instruction latencies. We also present HFuse, a new source to source CUDA compiler that implements automatic horizontal fusion. Our experimental results show that horizontal fusion can speed up the running time by 2.5%-60.8%. Our results reveal that the horizontal fusion is especially beneficial for fusing kernels with instructions that require different kinds of GPU resources (e.g., a memory-intensive kernel and a compute-intensive kernel).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Advanced Image and Video Retrieval Techniques