Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration

Jiahe Chen; Ziye Ma

arXiv:2605.09034·cs.LG·May 18, 2026

Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration

Jiahe Chen, Ziye Ma

PDF

1 Repo

TL;DR

This paper introduces a partial orthogonalization technique using power iteration to accelerate zeroth-order spectral optimization, significantly improving convergence speed in large language model fine-tuning.

Contribution

It proposes replacing the Newton-Schulz orthogonalization with a streaming power-iteration method for better efficiency and robustness in noisy zeroth-order optimization.

Findings

01

Achieves 1.5x to 4x faster convergence than ZO-Muon.

02

Reaches competitive final accuracies with less training time.

03

Demonstrates effectiveness across multiple large language models.

Abstract

Zeroth-order (ZO) optimization has become increasingly popular and important in fine-tuning large language models (LLMs), especially on edge devices due to its ability to adjust the model to local data without the need for memory-intensive back-propagation. Recent works try to reduce ZO variance through low-dimensional subspace search, but subspace restriction alone leaves key optimization geometry under-exploited, motivating additional acceleration. In this work, we focus on the hidden layer training problem in which spectral optimizers like Muon outperform AdamW due to its ability to exploit weak spectral directions by orthogonalization. However, we have discovered that unlike in the first-order setting, full orthogonalization works poorly in the ZO setting since the gradient estimates are highly noisy and unreliable. To address this issue, we propose applying partial spectral…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MOFA-LAB/ZO-MOPI.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.