Calibration-Aided Edge Inference Offloading via Adaptive Model   Partitioning of Deep Neural Networks

Roberto G. Pacheco; Rodrigo S. Couto; Osvaldo Simeone

arXiv:2010.16335·cs.LG·January 29, 2021

Calibration-Aided Edge Inference Offloading via Adaptive Model Partitioning of Deep Neural Networks

Roberto G. Pacheco, Rodrigo S. Couto, Osvaldo Simeone

PDF

TL;DR

This paper proposes a calibration approach for early-exit DNNs to improve offloading decisions in mobile cloud inference, reducing unnecessary cloud communication and enhancing accuracy.

Contribution

It introduces a calibration method for early-exit DNNs to improve the reliability of offloading decisions in adaptive model partitioning.

Findings

01

Calibration improves offloading accuracy

02

Reduces unnecessary cloud communication

03

Enhances inference reliability

Abstract

Mobile devices can offload deep neural network (DNN)-based inference to the cloud, overcoming local hardware and energy limitations. However, offloading adds communication delay, thus increasing the overall inference time, and hence it should be used only when needed. An approach to address this problem consists of the use of adaptive model partitioning based on early-exit DNNs. Accordingly, the inference starts at the mobile device, and an intermediate layer estimates the accuracy: If the estimated accuracy is sufficient, the device takes the inference decision; Otherwise, the remaining layers of the DNN run at the cloud. Thus, the device offloads the inference to the cloud only if it cannot classify a sample with high confidence. This offloading requires a correct accuracy prediction at the device. Nevertheless, DNNs are typically miscalibrated, providing overconfident decisions. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.