On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh, Emad Fallahzadeh, Bram Adams, Ahmed E. Hassan

TL;DR
This paper empirically evaluates how different black-box deployment strategies across edge AI tiers affect latency and accuracy, providing guidance for MLOps engineers on optimal configurations.
Contribution
It systematically compares operator and tier combinations for edge AI deployment, revealing trade-offs and best practices for latency and accuracy optimization.
Findings
Hybrid Quantization + Early Exit reduces latency with moderate accuracy loss.
Quantization alone minimizes accuracy loss at various latency levels.
Partitioning is preferable in resource-constrained mobile environments.
Abstract
Deciding what combination of operators to use across the Edge AI tiers to achieve specific latency and model performance requirements is an open question for MLOps engineers. This study aims to empirically assess the accuracy vs inference time trade-off of different black-box Edge AI deployment strategies, i.e., combinations of deployment operators and deployment tiers. In this paper, we conduct inference experiments involving 3 deployment operators (i.e., Partitioning, Quantization, Early Exit), 3 deployment tiers (i.e., Mobile, Edge, Cloud) and their combinations on four widely used Computer-Vision models to investigate the optimal strategies from the point of view of MLOps developers. Our findings suggest that Edge deployment using the hybrid Quantization + Early Exit operator could be preferred over non-hybrid operators (Quantization/Early Exit on Edge, Partition on Mobile-Edge)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · IoT and Edge/Fog Computing · Graph Theory and Algorithms
