Compact LLM Deployment and World Model Assisted Offloading in Mobile Edge Computing
Ruichen Zhang, Xiaofeng Luo, Jiayi He, Dusit Niyato, Jiawen Kang, Zehui Xiong, and Yonghui Li

TL;DR
This paper presents a framework for deploying compact LLMs on mobile edge devices using pruning, quantization, and knowledge distillation, and introduces a world model-based offloading algorithm to optimize inference latency and quality.
Contribution
It introduces an integrated approach combining model compression and a novel world model-PPO algorithm for efficient LLM inference offloading in MEC networks.
Findings
Model compression reduces storage by 70-80% and energy use by 50%.
World model-PPO accelerates convergence and improves reward by 15.8%.
Inference latency decreases by 12-30%, maintaining accuracy and reducing hallucinations.
Abstract
This paper investigates compact large language model (LLM) deployment and world-model-assisted inference offloading in mobile edge computing (MEC) networks. We first propose an edge compact LLM deployment (ECLD) framework that jointly applies structured pruning, low-bit quantization, and knowledge distillation to construct edge-deployable LLM variants, and we evaluate these models using four complementary metrics: accessibility, energy consumption, hallucination rate, and generalization accuracy. Building on the resulting compact models, we formulate an MEC offloading optimization problem that minimizes the long-term average inference latency subject to per-device energy budgets and LLM-specific quality-of-service constraints on effective accuracy and hallucination. To solve this problem under unknown and time-varying network dynamics, we develop a world model-proximal policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Big Data and Digital Economy · Advanced Neural Network Applications
