Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity

Wentao Guo; Jikai Long; Yimeng Zeng; Zirui Liu; Xinyu Yang; Yide Ran,; Jacob R. Gardner; Osbert Bastani; Christopher De Sa; Xiaodong Yu; Beidi Chen,; Zhaozhuo Xu

arXiv:2406.02913·cs.LG·June 6, 2024

Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity

Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran,, Jacob R. Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen,, Zhaozhuo Xu

PDF

Open Access

TL;DR

This paper introduces a method for efficient zeroth-order fine-tuning of large language models by focusing on a small subset of sensitive parameters and applying quantization, enabling memory-efficient training on limited hardware.

Contribution

The study demonstrates that fine-tuning only 0.1% of sensitive parameters with ZO and quantization surpasses full ZO fine-tuning performance and enables training on low-memory devices.

Findings

01

Fine-tuning 0.1% sensitive parameters outperforms full ZO fine-tuning.

02

Quantization combined with ZO allows training on devices with less than 8 GiB memory.

03

Speedup in wall-clock time during fine-tuning process.

Abstract

Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, the application of ZO fine-tuning in memory-constrained settings such as mobile phones and laptops is still challenging since full precision forward passes are infeasible. In this study, we address this limitation by integrating sparsity and quantization into ZO fine-tuning of LLMs. Specifically, we investigate the feasibility of fine-tuning an extremely small subset of LLM parameters using ZO. This approach allows the majority of un-tuned parameters to be quantized to accommodate the constraint of limited device memory. Our findings reveal that the pre-training process can identify a set of "sensitive parameters" that can guide the ZO fine-tuning of LLMs on downstream tasks. Our results demonstrate that fine-tuning 0.1% sensitive parameters in the LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElectromagnetic Simulation and Numerical Methods · Particle accelerators and beam dynamics · Numerical methods for differential equations

MethodsSparse Evolutionary Training