Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning
Qitao Tan, Jun Liu, Zheng Zhan, Caiwei Ding, Yanzhi Wang, Xiaolong Ma, Jaewoo Lee, Jin Lu, Geng Yuan

TL;DR
This paper introduces DiZO, a divergence-driven zeroth-order optimization method that accelerates and improves the accuracy of large language model fine-tuning while significantly reducing memory usage.
Contribution
The paper presents a novel layer-wise divergence analysis and DiZO optimization, bridging the gap between zeroth-order and first-order fine-tuning in LLMs.
Findings
DiZO reduces convergence iterations by up to 48%.
DiZO outperforms existing ZO baselines on multiple LLMs.
In some cases, DiZO surpasses memory-intensive FO fine-tuning.
Abstract
Large language models (LLMs) excel across various tasks, but standard first-order (FO) fine-tuning demands considerable memory, significantly limiting real-world deployment. Recently, zeroth-order (ZO) optimization stood out as a promising memory-efficient training paradigm, avoiding backward passes and relying solely on forward passes for gradient estimation, making it attractive for resource-constrained scenarios. However, ZO method lags far behind FO method in both convergence speed and accuracy. To bridge the gap, we introduce a novel layer-wise divergence analysis that uncovers the distinct update pattern of FO and ZO optimization. Aiming to resemble the learning capacity of FO method from the findings, we propose Divergence-driven Zeroth-Order (DiZO) optimization. DiZO conducts divergence-driven layer adaptation by incorporating projections to ZO updates, generating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectromagnetic Simulation and Numerical Methods · Iterative Learning Control Systems · Advanced Surface Polishing Techniques
