Collaborative Inference for Large Models with Task Offloading and Early Exiting
Zuan Xie, Yang Xu, Hongli Xu, Yunming Liao, Zhiyuan Yao

TL;DR
This paper proposes DTO-EE, a distributed algorithm for collaborative inference of large models at the network edge, optimizing task offloading and early exit to reduce delay and enhance accuracy in dynamic 5G smart city environments.
Contribution
It introduces a theoretical analysis and a convex optimization-based distributed algorithm for joint optimization of offloading and early exit thresholds in heterogeneous edge systems.
Findings
Reduces average response delay by 21%-41%.
Improves inference accuracy by 1%-4%.
Effectively balances delay and accuracy in dynamic environments.
Abstract
In 5G smart cities, edge computing is employed to provide nearby computing services for end devices, and the large-scale models (e.g., GPT and LLaMA) can be deployed at the network edge to boost the service quality. However, due to the constraints of memory size and computing capacity, it is difficult to run these large-scale models on a single edge node. To meet the resource constraints, a large-scale model can be partitioned into multiple sub-models and deployed across multiple edge nodes. Then tasks are offloaded to the edge nodes for collaborative inference. Additionally, we incorporate the early exit mechanism to further accelerate inference. However, the heterogeneous system and dynamic environment will significantly affect the inference efficiency. To address these challenges, we theoretically analyze the coupled relationship between task offloading strategy and confidence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Age of Information Optimization · Scientific Computing and Data Management
