Compact LLM Deployment and World Model Assisted Offloading in Mobile Edge Computing

Ruichen Zhang; Xiaofeng Luo; Jiayi He; Dusit Niyato; Jiawen Kang; Zehui Xiong; and Yonghui Li

arXiv:2602.13628·cs.NI·February 17, 2026

Compact LLM Deployment and World Model Assisted Offloading in Mobile Edge Computing

Ruichen Zhang, Xiaofeng Luo, Jiayi He, Dusit Niyato, Jiawen Kang, Zehui Xiong, and Yonghui Li

PDF

Open Access

TL;DR

This paper presents a framework for deploying compact LLMs on mobile edge devices using pruning, quantization, and knowledge distillation, and introduces a world model-based offloading algorithm to optimize inference latency and quality.

Contribution

It introduces an integrated approach combining model compression and a novel world model-PPO algorithm for efficient LLM inference offloading in MEC networks.

Findings

01

Model compression reduces storage by 70-80% and energy use by 50%.

02

World model-PPO accelerates convergence and improves reward by 15.8%.

03

Inference latency decreases by 12-30%, maintaining accuracy and reducing hallucinations.

Abstract

This paper investigates compact large language model (LLM) deployment and world-model-assisted inference offloading in mobile edge computing (MEC) networks. We first propose an edge compact LLM deployment (ECLD) framework that jointly applies structured pruning, low-bit quantization, and knowledge distillation to construct edge-deployable LLM variants, and we evaluate these models using four complementary metrics: accessibility, energy consumption, hallucination rate, and generalization accuracy. Building on the resulting compact models, we formulate an MEC offloading optimization problem that minimizes the long-term average inference latency subject to per-device energy budgets and LLM-specific quality-of-service constraints on effective accuracy and hallucination. To solve this problem under unknown and time-varying network dynamics, we develop a world model-proximal policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Big Data and Digital Economy · Advanced Neural Network Applications