AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning

Wei Lin; Yining Jiang; Qingyu Song; Qiao Xiang; Hong Xu

arXiv:2601.17261·cs.LG·February 11, 2026

AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning

Wei Lin, Yining Jiang, Qingyu Song, Qiao Xiang, Hong Xu

PDF

Open Access

TL;DR

AGZO introduces an activation-guided approach to zeroth-order optimization for fine-tuning large language models, leveraging activation structure to improve efficiency and performance under memory constraints.

Contribution

It proposes a novel activation-informed subspace method for ZO optimization, with theoretical guarantees and empirical improvements over existing approaches.

Findings

01

AGZO outperforms state-of-the-art ZO baselines.

02

It narrows the performance gap with first-order fine-tuning.

03

Maintains similar memory footprint as other ZO methods.

Abstract

Zeroth-Order (ZO) optimization has emerged as a promising solution for fine-tuning LLMs under strict memory constraints, as it avoids the prohibitive memory cost of storing activations for backpropagation. However, existing ZO methods typically employ isotropic perturbations, neglecting the rich structural information available during the forward pass. In this paper, we identify a crucial link between gradient formation and activation structure: the gradient of a linear layer is confined to the subspace spanned by its input activations. Leveraging this insight, we propose Activation-Guided Zeroth-Order optimization (AGZO). Unlike prior methods, AGZO extracts a compact, activation-informed subspace on the fly during the forward pass and restricts perturbations to this low-rank subspace. We provide a theoretical framework showing that AGZO optimizes a subspace-smoothed objective and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Topology Optimization in Engineering