Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations
Yongchao Liu, Yue Jin, Yong Chen, Teng Teng, Hang Ou, Rui Zhao, Yao, Zhang

TL;DR
Woodpecker-DL is a hardware-aware framework that optimizes deep neural network inference through graph optimization, automated searches, DSL compilation, and system-level exploration, achieving significant speedups on GPU hardware.
Contribution
It introduces a novel hardware-aware optimization framework with automated search methods and a DSL compiler for efficient inference acceleration.
Findings
Achieved up to 5.40x speedup over cuDNN on convolution operators.
Ran up to 1.18x faster than TensorRT for end-to-end inference.
Demonstrated effectiveness on Tesla P100 GPU with multiple optimization techniques.
Abstract
Accelerating deep model training and inference is crucial in practice. Existing deep learning frameworks usually concentrate on optimizing training speed and pay fewer attentions to inference-specific optimizations. Actually, model inference differs from training in terms of computation, e.g. parameters are refreshed each gradient update step during training, but kept invariant during inference. These special characteristics of model inference open new opportunities for its optimization. In this paper, we propose a hardware-aware optimization framework, namely Woodpecker-DL (WPK), to accelerate inference by taking advantage of multiple joint optimizations from the perspectives of graph optimization, automated searches, domain-specific language (DSL) compiler techniques and system-level exploration. In WPK, we investigated two new automated search approaches based on genetic algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Adversarial Robustness in Machine Learning
MethodsConvolution
