Robust and Efficient Zeroth-Order LLM Fine-Tuning via Adaptive Bayesian Subspace Optimizer
Jian Feng, Zhihong Huang

TL;DR
This paper introduces BSZO, a Bayesian subspace zeroth-order optimizer for fine-tuning large language models, which enhances convergence and robustness while reducing memory usage, outperforming existing methods across multiple models and tasks.
Contribution
The paper proposes BSZO, a novel adaptive Bayesian ZO optimizer that leverages Kalman filtering and subspace methods to improve LLM fine-tuning efficiency and robustness.
Findings
BSZO achieves up to 6.67% absolute improvement on OPT-13B.
BSZO maintains robustness under low-precision training.
BSZO reduces memory usage close to inference-only baselines.
Abstract
Fine-tuning large language models (LLMs) with zeroth-order (ZO) optimization reduces memory by approximating gradients through function evaluations. However, existing methods essentially perform updates in a one-dimensional space, and suffer from collapse or substantial performance degradation under low-precision training. We introduce BSZO, an adaptive \textbf{B}ayesian \textbf{S}ubspace \textbf{Z}eroth-Order \textbf{O}ptimizer, which applies Kalman filtering to combine finite-difference information across multiple perturbation directions within a subspace. By treating each finite-difference measurement as a noisy observation, BSZO builds a posterior distribution over the subspace-projected gradient and updates it through Bayesian inference, with a residual-based adaptive mechanism to adapt to noise variations. Theoretical analysis shows that BSZO improves the convergence rate by a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetaheuristic Optimization Algorithms Research · Speech Recognition and Synthesis · Topic Modeling
