Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural   Networks on Edge NPUs

Alexandros Kouris; Stylianos I. Venieris; Stefanos Laskaridis,; Nicholas D. Lane

arXiv:2209.13443·cs.LG·August 8, 2023·6 cites

Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs

Alexandros Kouris, Stylianos I. Venieris, Stefanos Laskaridis,, Nicholas D. Lane

PDF

Open Access

TL;DR

This paper introduces Fluid Batching and an early-exit-aware scheduling algorithm for edge NPUs, significantly improving latency and SLO satisfaction in dynamic, multi-device neural network inference scenarios.

Contribution

It proposes novel hardware design dimensions and a scheduling algorithm to enhance run-time adaptability and efficiency for early-exit neural networks on edge NPUs.

Findings

01

Achieves 1.97x reduction in average latency.

02

Attains 6.7x improvement in tail latency SLO satisfaction.

03

Demonstrates effective handling of dynamic, multi-device inference loads.

Abstract

With deep neural networks (DNNs) emerging as the backbone in a multitude of computer vision tasks, their adoption in real-world applications broadens continuously. Given the abundance and omnipresence of smart devices in the consumer landscape, "smart ecosystems'' are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying centralised neural processing units (NPUs) at the edge, where multiple devices (e.g. in smart homes or autonomous vehicles) can stream their data for processing with dynamic rates. While this provides enhanced potential for input batching, naive solutions can lead to subpar performance and quality of experience, especially under spiking loads. At the same time, the deployment of dynamic DNNs, comprising stochastic computation graphs (e.g. early-exit (EE) models), introduces a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · IoT and Edge/Fog Computing · Advanced Neural Network Applications