CurvZO: Adaptive Curvature-Guided Sparse Zeroth-Order Optimization for Efficient LLM Fine-Tuning
Shuo Wang, Ziyu Chen, Ming Tang

TL;DR
CurvZO introduces an adaptive, curvature-guided sparse zeroth-order optimization method that enhances large language model fine-tuning efficiency by reducing variance and dynamically adjusting perturbations based on curvature signals.
Contribution
This paper presents CurvZO, a novel ZO optimization algorithm that adaptively leverages curvature information to improve parameter sampling and convergence in LLM fine-tuning.
Findings
Improves fine-tuning accuracy by up to 4.4 points.
Achieves up to 2x speedup in training time.
Maintains memory efficiency while enhancing performance.
Abstract
Fine-tuning large language models (LLMs) with backpropagation achieves high performance but incurs substantial memory overhead, limiting scalability on resource-constrained hardware. Zeroth-order (ZO) optimization provides a memory-efficient alternative by relying solely on forward passes, yet it typically suffers from slow or unstable convergence due to high-variance gradient estimates. Sparse ZO updates partially address this issue by perturbing only a subset of parameters, but their effectiveness hinges on selecting informative parameters, which is challenging in ZO optimization because each query yields only scalar feedback. We propose \textbf{Adaptive Curvature-Guided Sparse Zeroth-Order Optimization (CurvZO)}, which tracks curvature signals online from scalar ZO feedback and leverages these signals to construct a parameter-wise sampling distribution for selecting coordinates at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis
