Hierarchical adaptive control for real-time dynamic inference at the edge
Francesco Daghero, Mahyar Tourchi Moghaddam, Mikkel Baun Kj{\ae}rgaard

TL;DR
This paper introduces a hierarchical adaptive control architecture for real-time ML inference at the edge, optimizing model deployment and resource management to handle data drift and resource variability efficiently.
Contribution
It presents a novel two-tier adaptive system that co-optimizes model and system decisions, maintaining efficiency and constraints without frequent redeployments.
Findings
Achieved up to 2.45x latency reduction and 2.86x energy savings.
Maintained less than 4% accuracy drop under distribution mismatch.
Demonstrated effectiveness on embedded hardware.
Abstract
Industrial systems increasingly depend on Machine Learning (ML), and operate on heterogeneous nodes that must satisfy tight latency, energy, and memory constraints. Dynamic ML models, which reconfigure their computational footprint at runtime, promise high energy efficiency and lower average latency for modest accuracy tradeoffs; however, their deployment is complex due to the additional hyperparameters they rely on. These hyperparameters, controlling the accuracy versus average latency tradeoff, are often tuned on a calibration dataset that must match the test time distribution, an assumption that rarely holds in real-world scenarios, leading to suboptimal operational conditions, possibly below static models. We propose a two-tier adaptive architecture that co-optimizes model and system decisions. At the global level, a scheduler configures and deploys, for each edge node, a cascade of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
