Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity
Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran,, Jacob R. Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen,, Zhaozhuo Xu

TL;DR
This paper introduces a method for efficient zeroth-order fine-tuning of large language models by focusing on a small subset of sensitive parameters and applying quantization, enabling memory-efficient training on limited hardware.
Contribution
The study demonstrates that fine-tuning only 0.1% of sensitive parameters with ZO and quantization surpasses full ZO fine-tuning performance and enables training on low-memory devices.
Findings
Fine-tuning 0.1% sensitive parameters outperforms full ZO fine-tuning.
Quantization combined with ZO allows training on devices with less than 8 GiB memory.
Speedup in wall-clock time during fine-tuning process.
Abstract
Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, the application of ZO fine-tuning in memory-constrained settings such as mobile phones and laptops is still challenging since full precision forward passes are infeasible. In this study, we address this limitation by integrating sparsity and quantization into ZO fine-tuning of LLMs. Specifically, we investigate the feasibility of fine-tuning an extremely small subset of LLM parameters using ZO. This approach allows the majority of un-tuned parameters to be quantized to accommodate the constraint of limited device memory. Our findings reveal that the pre-training process can identify a set of "sensitive parameters" that can guide the ZO fine-tuning of LLMs on downstream tasks. Our results demonstrate that fine-tuning 0.1% sensitive parameters in the LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectromagnetic Simulation and Numerical Methods · Particle accelerators and beam dynamics · Numerical methods for differential equations
MethodsSparse Evolutionary Training
