Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs
Alexandros Kouris, Stylianos I. Venieris, Stefanos Laskaridis,, Nicholas D. Lane

TL;DR
This paper introduces Fluid Batching and an early-exit-aware scheduling algorithm for edge NPUs, significantly improving latency and SLO satisfaction in dynamic, multi-device neural network inference scenarios.
Contribution
It proposes novel hardware design dimensions and a scheduling algorithm to enhance run-time adaptability and efficiency for early-exit neural networks on edge NPUs.
Findings
Achieves 1.97x reduction in average latency.
Attains 6.7x improvement in tail latency SLO satisfaction.
Demonstrates effective handling of dynamic, multi-device inference loads.
Abstract
With deep neural networks (DNNs) emerging as the backbone in a multitude of computer vision tasks, their adoption in real-world applications broadens continuously. Given the abundance and omnipresence of smart devices in the consumer landscape, "smart ecosystems'' are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying centralised neural processing units (NPUs) at the edge, where multiple devices (e.g. in smart homes or autonomous vehicles) can stream their data for processing with dynamic rates. While this provides enhanced potential for input batching, naive solutions can lead to subpar performance and quality of experience, especially under spiking loads. At the same time, the deployment of dynamic DNNs, comprising stochastic computation graphs (e.g. early-exit (EE) models), introduces a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · IoT and Edge/Fog Computing · Advanced Neural Network Applications
