Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures
Yiming Chen, Yuan Zhang, Liyuan Cao, Kun Yuan, Zaiwen Wen

TL;DR
This paper introduces LOZO, a low-rank zeroth-order optimization method for fine-tuning large language models efficiently, capturing low-rank gradient structures to improve performance while reducing memory usage.
Contribution
The paper proposes a novel low-rank zeroth-order gradient estimator and a corresponding algorithm, LOZO, with convergence guarantees and enhanced performance in LLM fine-tuning.
Findings
LOZO outperforms existing ZO methods in experiments.
LOZO closely matches first-order fine-tuning performance.
The low-rank approach reduces memory overhead significantly.
Abstract
Parameter-efficient fine-tuning (PEFT) significantly reduces memory costs when adapting large language models (LLMs) for downstream applications. However, traditional first-order (FO) fine-tuning algorithms incur substantial memory overhead due to the need to store activation values for back-propagation during gradient computation, particularly in long-context fine-tuning tasks. Zeroth-order (ZO) algorithms offer a promising alternative by approximating gradients using finite differences of function values, thus eliminating the need for activation storage. Nevertheless, existing ZO methods struggle to capture the low-rank gradient structure common in LLM fine-tuning, leading to suboptimal performance. This paper proposes a low-rank ZO gradient estimator and introduces a novel low-rank ZO algorithm (LOZO) that effectively captures this structure in LLMs. We provide convergence guarantees…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
