Communication-Computation Trade-Off in Resource-Constrained Edge Inference
Jiawei Shao, Jun Zhang

TL;DR
This paper introduces a three-step framework for optimizing edge AI inference by balancing computation and communication costs, significantly reducing latency on resource-limited devices.
Contribution
It proposes a novel framework combining model split, compression, and encoding techniques for efficient device-edge co-inference.
Findings
Achieves better trade-offs between computation and communication costs.
Reduces inference latency significantly compared to baseline methods.
Effectively balances model complexity and data transmission overhead.
Abstract
The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computation cost of the on-device model and the communication cost of forwarding the intermediate feature to the edge server. A three-step framework is proposed for the effective inference: (1) model split point selection to determine the on-device model, (2) communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and (3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Advanced Neural Network Applications · Age of Information Optimization
