A Task Decomposition and Planning Framework for Efficient LLM Inference in AI-Enabled WiFi-Offload Networks
Mingqi Han, Xinghua Sun

TL;DR
This paper introduces a task decomposition and scheduling framework for efficient large language model inference in WiFi-offload networks, improving latency and accuracy through collaborative edge computing.
Contribution
It proposes an LLM-based planner for task decomposition and a decomposition-aware scheduling strategy for multi-user edge offloading.
Findings
Reduces average latency by 20% compared to baselines.
Improves overall reward by 80%.
Lightweight planner approaches large teacher model performance.
Abstract
AI WiFi offload is emerging as a promising approach for providing large language model (LLM) services to resource-constrained wireless devices. However, unlike conventional edge computing, LLM inference over WiFi must jointly address heterogeneous model capabilities, wireless contention, uncertain task complexity, and semantic correlation among reasoning tasks. In this paper, we investigate LLM inference offloading in a multi-user multi-edge WiFi network, where each task can be executed locally, directly offloaded to a nearby edge access point (AP), or decomposed into multiple subtasks for collaborative execution across local and edge nodes. To this end, we propose a user-edge collaborative framework with an LLM-based planner that not only performs task decomposition but also infers subtask difficulty and expected output token length, enabling more accurate estimation of execution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
