Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware   Multifaceted Optimizations

Yongchao Liu; Yue Jin; Yong Chen; Teng Teng; Hang Ou; Rui Zhao; Yao; Zhang

arXiv:2008.04567·cs.DC·August 12, 2020

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

Yongchao Liu, Yue Jin, Yong Chen, Teng Teng, Hang Ou, Rui Zhao, Yao, Zhang

PDF

Open Access

TL;DR

Woodpecker-DL is a hardware-aware framework that optimizes deep neural network inference through graph optimization, automated searches, DSL compilation, and system-level exploration, achieving significant speedups on GPU hardware.

Contribution

It introduces a novel hardware-aware optimization framework with automated search methods and a DSL compiler for efficient inference acceleration.

Findings

01

Achieved up to 5.40x speedup over cuDNN on convolution operators.

02

Ran up to 1.18x faster than TensorRT for end-to-end inference.

03

Demonstrated effectiveness on Tesla P100 GPU with multiple optimization techniques.

Abstract

Accelerating deep model training and inference is crucial in practice. Existing deep learning frameworks usually concentrate on optimizing training speed and pay fewer attentions to inference-specific optimizations. Actually, model inference differs from training in terms of computation, e.g. parameters are refreshed each gradient update step during training, but kept invariant during inference. These special characteristics of model inference open new opportunities for its optimization. In this paper, we propose a hardware-aware optimization framework, namely Woodpecker-DL (WPK), to accelerate inference by taking advantage of multiple joint optimizations from the perspectives of graph optimization, automated searches, domain-specific language (DSL) compiler techniques and system-level exploration. In WPK, we investigated two new automated search approaches based on genetic algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Adversarial Robustness in Machine Learning

MethodsConvolution