Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning

Qitao Tan; Jun Liu; Zheng Zhan; Caiwei Ding; Yanzhi Wang; Xiaolong Ma; Jaewoo Lee; Jin Lu; Geng Yuan

arXiv:2502.03304·cs.LG·November 4, 2025

Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning

Qitao Tan, Jun Liu, Zheng Zhan, Caiwei Ding, Yanzhi Wang, Xiaolong Ma, Jaewoo Lee, Jin Lu, Geng Yuan

PDF

Open Access

TL;DR

This paper introduces DiZO, a divergence-driven zeroth-order optimization method that accelerates and improves the accuracy of large language model fine-tuning while significantly reducing memory usage.

Contribution

The paper presents a novel layer-wise divergence analysis and DiZO optimization, bridging the gap between zeroth-order and first-order fine-tuning in LLMs.

Findings

01

DiZO reduces convergence iterations by up to 48%.

02

DiZO outperforms existing ZO baselines on multiple LLMs.

03

In some cases, DiZO surpasses memory-intensive FO fine-tuning.

Abstract

Large language models (LLMs) excel across various tasks, but standard first-order (FO) fine-tuning demands considerable memory, significantly limiting real-world deployment. Recently, zeroth-order (ZO) optimization stood out as a promising memory-efficient training paradigm, avoiding backward passes and relying solely on forward passes for gradient estimation, making it attractive for resource-constrained scenarios. However, ZO method lags far behind FO method in both convergence speed and accuracy. To bridge the gap, we introduce a novel layer-wise divergence analysis that uncovers the distinct update pattern of FO and ZO optimization. Aiming to resemble the learning capacity of FO method from the findings, we propose Divergence-driven Zeroth-Order (DiZO) optimization. DiZO conducts divergence-driven layer adaptation by incorporating projections to ZO updates, generating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElectromagnetic Simulation and Numerical Methods · Iterative Learning Control Systems · Advanced Surface Polishing Techniques