Calibration-Aided Edge Inference Offloading via Adaptive Model Partitioning of Deep Neural Networks
Roberto G. Pacheco, Rodrigo S. Couto, Osvaldo Simeone

TL;DR
This paper proposes a calibration approach for early-exit DNNs to improve offloading decisions in mobile cloud inference, reducing unnecessary cloud communication and enhancing accuracy.
Contribution
It introduces a calibration method for early-exit DNNs to improve the reliability of offloading decisions in adaptive model partitioning.
Findings
Calibration improves offloading accuracy
Reduces unnecessary cloud communication
Enhances inference reliability
Abstract
Mobile devices can offload deep neural network (DNN)-based inference to the cloud, overcoming local hardware and energy limitations. However, offloading adds communication delay, thus increasing the overall inference time, and hence it should be used only when needed. An approach to address this problem consists of the use of adaptive model partitioning based on early-exit DNNs. Accordingly, the inference starts at the mobile device, and an intermediate layer estimates the accuracy: If the estimated accuracy is sufficient, the device takes the inference decision; Otherwise, the remaining layers of the DNN run at the cloud. Thus, the device offloads the inference to the cloud only if it cannot classify a sample with high confidence. This offloading requires a correct accuracy prediction at the device. Nevertheless, DNNs are typically miscalibrated, providing overconfident decisions. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
