Elucidating Subspace Perturbation in Zeroth-Order Optimization: Theory and Practice at Scale
Sihwan Park, Jihun Yun, SungYub Kim, Souvik Kundu, Eunho Yang

TL;DR
This paper develops a theoretical framework for understanding subspace perturbations in zeroth-order optimization, introduces a practical block coordinate descent method, and demonstrates significant speedups in large language model fine-tuning.
Contribution
It provides a unified theory explaining how subspace perturbations improve convergence in ZO methods and proposes an efficient block coordinate descent algorithm for large-scale applications.
Findings
MeZO-BCD achieves up to 2.77x speedup over MeZO.
Subspace perturbations reduce gradient noise and accelerate convergence.
Theoretical analysis links high dimensionality to optimization bottlenecks.
Abstract
Zeroth-order (ZO) optimization has emerged as a promising alternative to gradient-based backpropagation methods, particularly for black-box optimization and large language model (LLM) fine-tuning. However, ZO methods often suffer from slow convergence due to high-variance stochastic gradient estimators. While subspace perturbations, such as sparsity and low-rank constraints, have been explored to mitigate this issue, their effectiveness remains poorly understood. In this work, we develop a \emph{unified theoretical framework} that analyzes both the convergence and generalization properties of ZO optimization under subspace perturbations. We show that high dimensionality is the primary bottleneck and introduce the notion of \textit{subspace alignment} to explain how the subspace perturbations reduce gradient noise and accelerate convergence. Our analysis further shows that a broad class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Thermodynamics and Statistical Mechanics · Fractional Differential Equations Solutions · Differential Equations and Numerical Methods
