MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs
Zhenyan Lu, Daliang Xu, Dongqi Cai, Zexi Li, Wei Liu, Fangming Liu, Shangguang Wang, Mengwei Xu

TL;DR
MobiEdit introduces a resource-efficient framework for personalized knowledge editing of large language models directly on mobile devices, significantly reducing memory, energy, and latency requirements.
Contribution
It presents the first mobile knowledge editing method that replaces full backpropagation with quantized forward-only gradient estimation, enabling real-time personalization on COTS mobile devices.
Findings
Enables real-time editing of 3B-parameter models on mobile devices.
Achieves 7.6× less memory, 14.7× less energy, and 3.6× less latency.
Compatible with energy-efficient mobile NPUs.
Abstract
Large language models (LLMs) are deployed on mobile devices to power killer applications such as intelligent assistants. LLMs pre-trained on general corpora often hallucinate when handling personalized or unseen queries, leading to incorrect or outdated responses. Knowledge editing addresses this by identifying and adjusting a small crucial portion of model weights, without compromising the general knowledge. However, prior knowledge editing methods are impractical to run on local devices due to the resource-heavy backpropagation (BP) needed for updates. We present MobiEdit, the first mobile knowledge editing framework that enables efficient LLM personalization on commercial off-the-shelf (COTS) mobile devices. MobiEdit replaces full-precision BP with quantized forward-only gradient estimation, thus compatible with the energy-efficient mobile neural processing units (NPUs). MobiEdit…
Peer Reviews
Decision·ICLR 2026 Poster
S1. Timely work and well motivated design decisions. All the design choices like NPU compatibility, memory efficiency, mixed precision, prefix reuse and early stopping are justified for the mobile environment. S2. Detailed empirical validation and system evaluation. The paper does analysis across different system metrics including energy profiling and thermal pressure.
W1. CPU baseline uses llm.c which is not an optimised implementation, as on the github of llm.c itself says it is slightly tweaked version of nanoGPT, which is a learning project. There are many optimised cpu implementation of different llms including llama.cpp [1] and many more, which authors could have used. So it is unclear if the gain is because of unoptimised cpu implementation (llm.c) vs optimised npu implementation or because of algorithm design. W2. Even though paper's major claim is re
1. technically sound, the forward-only zeroth-order editing with NPU-friendly mixed-precision removes backpropagation memory needs and is shown to be more robust under low-bit quantization. prefix-activation reuse and early stopping further cut compute 3. Strong on-device results on commercial phones
1. zeroth-order editing needs much more optimization steps to reach similar convergence, without early-stopping and caching, wall-clock time can erase efficiency gains. 2. sensitivity to hyperparameters: performance depends on the number of sampled directions (loss stability varies across 1/3/5 vs 300 directions) and on the early-stopping confidence threshold.
The paper is well written and has the following strengths: - The paper explores the topic of on-device LLM personalization on resource-constrained devices(mobile devices), which is an interesting and relevant topic. - MobiEdit is optimized towards a new trend of computing hardware(NPUs) via the use of forward-only editing. - The paper provides a theoretical justification showing that quantization without backpropagation (BP) is inherently more resilient to noise than BP-based quantization. - Com
While the topic is interesting, there are still a few weaknesses that need to be addressed: - The novelty of the paper is somewhat incremental, as its primary contributions build heavily on prior work. The concepts of zeroth-order gradient estimation and quantization have been explored previously; however, there is still a degree of novelty in adapting and applying these ideas to NPUs. - No empirical experiments to support the theoretical claim that MobiEdit quantization is more robust to noise
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Service-Oriented Architecture and Web Services · Digital Rights Management and Security
