On the Impact of White-box Deployment Strategies for Edge AI on Latency   and Model Performance

Jaskirat Singh; Bram Adams; Ahmed E. Hassan

arXiv:2411.00907·cs.DC·January 23, 2025

On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance

Jaskirat Singh, Bram Adams, Ahmed E. Hassan

PDF

Open Access

TL;DR

This study empirically compares white-box and black-box deployment strategies for Edge AI, analyzing their accuracy-latency trade-offs across different models and tiers to guide MLOps engineers.

Contribution

It provides a comprehensive empirical assessment of various deployment operators and their combinations, highlighting effective strategies for latency and accuracy trade-offs in Edge AI.

Findings

01

Distillation combined with SPTQ (DSPTQ) offers lower latency with small accuracy loss.

02

Distilled operators outperform partitioning in resource-constrained tiers.

03

Cloud deployment is preferable for low-input-size models, while Edge is better for high-input-size models.

Abstract

To help MLOps engineers decide which operator to use in which deployment scenario, this study aims to empirically assess the accuracy vs latency trade-off of white-box (training-based) and black-box operators (non-training-based) and their combinations in an Edge AI setup. We perform inference experiments including 3 white-box (i.e., QAT, Pruning, Knowledge Distillation), 2 black-box (i.e., Partition, SPTQ), and their combined operators (i.e., Distilled SPTQ, SPTQ Partition) across 3 tiers (i.e., Mobile, Edge, Cloud) on 4 commonly-used Computer Vision and Natural Language Processing models to identify the effective strategies, considering the perspective of MLOps Engineers. Our Results indicate that the combination of Distillation and SPTQ operators (i.e., DSPTQ) should be preferred over non-hybrid operators when lower latency is required in the edge at small to medium accuracy drop.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing

MethodsPruning