Improving the Performance of DNN-based Software Services using Automated   Layer Caching

Mohammadamin Abedi; Yanni Iouannou; Pooyan Jamshidi; Hadi Hemmati

arXiv:2209.08625·cs.LG·September 20, 2022·1 cites

Improving the Performance of DNN-based Software Services using Automated Layer Caching

Mohammadamin Abedi, Yanni Iouannou, Pooyan Jamshidi, Hadi Hemmati

PDF

Open Access

TL;DR

This paper introduces an automated online layer caching mechanism for DNNs that enables early exits during inference, significantly reducing computational complexity and latency without compromising accuracy.

Contribution

It presents a novel online caching approach using self-distillation and early exits, suitable for pre-trained models and real-time applications.

Findings

01

Reduced computational complexity by up to 58%.

02

Improved inference latency by up to 46%.

03

Maintained accuracy with minimal loss.

Abstract

Deep Neural Networks (DNNs) have become an essential component in many application domains including web-based services. A variety of these services require high throughput and (close to) real-time features, for instance, to respond or react to users' requests or to process a stream of incoming data on time. However, the trend in DNN design is toward larger models with many layers and parameters to achieve more accurate results. Although these models are often pre-trained, the computational complexity in such large models can still be relatively significant, hindering low inference latency. Implementing a caching mechanism is a typical systems engineering solution for speeding up a service response time. However, traditional caching is often not suitable for DNN-based services. In this paper, we propose an end-to-end automated solution to improve the performance of DNN-based services in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Advanced Neural Network Applications · Data Stream Mining Techniques

Methodstravel james · Early exiting using confidence measures