Communication-Computation Trade-Off in Resource-Constrained Edge   Inference

Jiawei Shao; Jun Zhang

arXiv:2006.02166·cs.LG·October 30, 2024·6 cites

Communication-Computation Trade-Off in Resource-Constrained Edge Inference

Jiawei Shao, Jun Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a three-step framework for optimizing edge AI inference by balancing computation and communication costs, significantly reducing latency on resource-limited devices.

Contribution

It proposes a novel framework combining model split, compression, and encoding techniques for efficient device-edge co-inference.

Findings

01

Achieves better trade-offs between computation and communication costs.

02

Reduces inference latency significantly compared to baseline methods.

03

Effectively balances model complexity and data transmission overhead.

Abstract

The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computation cost of the on-device model and the communication cost of forwarding the intermediate feature to the edge server. A three-step framework is proposed for the effective inference: (1) model split point selection to determine the on-device model, (2) communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and (3)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shaojiawei07/Edge_Inference_three-step_framework
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Advanced Neural Network Applications · Age of Information Optimization