Accelerating Deep Learning Inference with Cross-Layer Data Reuse on GPUs

Xueying Wang; Guangli Li; Xiao Dong; Jiansong Li; Lei Liu; and; Xiaobing Feng

arXiv:2007.06000·cs.DC·July 30, 2020

Accelerating Deep Learning Inference with Cross-Layer Data Reuse on GPUs

Xueying Wang, Guangli Li, Xiao Dong, Jiansong Li, Lei Liu, and, Xiaobing Feng

PDF

Open Access

TL;DR

This paper introduces a novel GPU-based layer fusion technique for CNNs that enhances data reuse and reduces inference time, achieving over 2x speedup on various CNN architectures.

Contribution

It proposes new fusion modes and an efficient code generation approach for cross-layer data reuse in CNN inference on GPUs.

Findings

01

Average speedup of 2.02x on CNN structures

02

1.57x speedup on end-to-end SqueezeNet inference

03

Effective utilization of multi-level memory hierarchy

Abstract

Accelerating the deep learning inference is very important for real-time applications. In this paper, we propose a novel method to fuse the layers of convolutional neural networks (CNNs) on Graphics Processing Units (GPUs), which applies data reuse analysis and access optimization in different levels of the memory hierarchy. To achieve the balance between computation and memory access, we explore the fusion opportunities in the CNN computation graph and propose three fusion modes of convolutional neural networks: straight, merge and split. Then, an approach for generating efficient fused code is designed, which goes deeper in multi-level memory usage for cross-layer data reuse. The effectiveness of our method is evaluated with the network layers from state-of-the-art CNNs on two different GPU platforms, NVIDIA TITAN Xp and Tesla P4. The experiments show that the average speedup is 2.02x…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning