MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines
Lei Gao, Amir Ziashahabi, Yue Niu, Salman Avestimehr, Murali Annavaram

TL;DR
MobiZO is a novel framework that enables efficient fine-tuning of large language models directly on edge devices by combining parallelized gradient estimation, specialized modules, and seamless integration with inference engines.
Contribution
It introduces a resource-efficient fine-tuning method for LLMs on edge devices, leveraging parallelism and a new module to reduce computational costs and memory usage.
Findings
Achieves significant runtime speedups
Reduces memory consumption
Improves fine-tuning accuracy on edge devices
Abstract
Large Language Models (LLMs) are currently pre-trained and fine-tuned on large cloud servers. The next frontier is LLM personalization, where a foundation model can be fine-tuned with user/task-specific data. Given the sensitive nature of such private data, it is desirable to fine-tune these models on edge devices to improve user trust. However, fine-tuning on resource-constrained edge devices presents significant challenges due to substantial memory and computational demands, as well as limited infrastructure support. We observe that inference engines (e.g., ExecuTorch) can be repurposed for fine-tuning by leveraging zeroth-order (ZO) optimization, which uses multiple forward passes to approximate gradients. While promising, direct application of ZO methods on edge devices is inefficient due to the high computational cost of multiple forward passes required for accurate gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVLSI and Analog Circuit Testing · Advancements in Photolithography Techniques · Semiconductor materials and devices
