Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs

Xuhao Chen

arXiv:1802.10280·cs.DC·April 5, 2019·20 cites

Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs

Xuhao Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces Escort, a GPU-optimized method for directly computing sparse convolutions in neural networks, significantly improving inference speed despite high sparsity from pruning.

Contribution

Escort presents a novel direct sparse convolution approach on GPUs, overcoming inefficiencies of traditional lowering methods and optimizing parallelism and locality.

Findings

01

Escort improves sparse convolution speed by up to 3.07x.

02

Inference speed increases by up to 1.69x over existing libraries.

03

Method maintains high accuracy while enhancing efficiency.

Abstract

Deep neural networks have achieved remarkable accuracy in many artificial intelligence applications, e.g. computer vision, at the cost of a large number of parameters and high computational complexity. Weight pruning can compress DNN models by removing redundant parameters in the networks, but it brings sparsity in the weight matrix, and therefore makes the computation inefficient on GPUs. Although pruning can remove more than 80% of the weights, it actually hurts inference performance (speed) when running models on GPUs. Two major problems cause this unsatisfactory performance on GPUs. First, lowering convolution onto matrix multiplication reduces data reuse opportunities and wastes memory bandwidth. Second, the sparsity brought by pruning makes the computation irregular, which leads to inefficiency when running on massively parallel GPUs. To overcome these two limitations, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenxuhao/caffe-escoin
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition

MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution