Flexible Deep Neural Network Processing

Hokchhay Tann; Soheil Hashemi; Sherief Reda

arXiv:1801.07353·cs.NE·January 24, 2018·5 cites

Flexible Deep Neural Network Processing

Hokchhay Tann, Soheil Hashemi, Sherief Reda

PDF

Open Access

TL;DR

This paper introduces a flexible ensemble processing method for deep neural networks that significantly reduces inference latency with minimal accuracy loss, adaptable to different quality and runtime requirements.

Contribution

The paper presents a novel flexible ensemble processing technique for DNNs that dynamically balances accuracy and latency, applicable to various network architectures.

Findings

01

Significant reduction in inference latency achieved

02

Minimal accuracy drop with the proposed method

03

Effective on AlexNet and ResNet-50 architectures

Abstract

The recent success of Deep Neural Networks (DNNs) has drastically improved the state of the art for many application domains. While achieving high accuracy performance, deploying state-of-the-art DNNs is a challenge since they typically require billions of expensive arithmetic computations. In addition, DNNs are typically deployed in ensemble to boost accuracy performance, which further exacerbates the system requirements. This computational overhead is an issue for many platforms, e.g. data centers and embedded systems, with tight latency and energy budgets. In this article, we introduce flexible DNNs ensemble processing technique, which achieves large reduction in average inference latency while incurring small to negligible accuracy drop. Our technique is flexible in that it allows for dynamic adaptation between quality of results (QoR) and execution runtime. We demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications

Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/