Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank   Structures

Yiming Chen; Yuan Zhang; Liyuan Cao; Kun Yuan; Zaiwen Wen

arXiv:2410.07698·cs.LG·October 11, 2024

Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures

Yiming Chen, Yuan Zhang, Liyuan Cao, Kun Yuan, Zaiwen Wen

PDF

Open Access 1 Repo

TL;DR

This paper introduces LOZO, a low-rank zeroth-order optimization method for fine-tuning large language models efficiently, capturing low-rank gradient structures to improve performance while reducing memory usage.

Contribution

The paper proposes a novel low-rank zeroth-order gradient estimator and a corresponding algorithm, LOZO, with convergence guarantees and enhanced performance in LLM fine-tuning.

Findings

01

LOZO outperforms existing ZO methods in experiments.

02

LOZO closely matches first-order fine-tuning performance.

03

The low-rank approach reduces memory overhead significantly.

Abstract

Parameter-efficient fine-tuning (PEFT) significantly reduces memory costs when adapting large language models (LLMs) for downstream applications. However, traditional first-order (FO) fine-tuning algorithms incur substantial memory overhead due to the need to store activation values for back-propagation during gradient computation, particularly in long-context fine-tuning tasks. Zeroth-order (ZO) algorithms offer a promising alternative by approximating gradients using finite differences of function values, thus eliminating the need for activation storage. Nevertheless, existing ZO methods struggle to capture the low-rank gradient structure common in LLM fine-tuning, leading to suboptimal performance. This paper proposes a low-rank ZO gradient estimator and introduces a novel low-rank ZO algorithm (LOZO) that effectively captures this structure in LLMs. We provide convergence guarantees…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

optsuite/LOZO
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling