Zero-Order Optimization for LLM Fine-Tuning via Learnable Direction Sampling

Valery Parfenov; Grigoriy Evseev; Andrey Veprikov; Nikolay Bushkov; Stanislav Moiseev; Aleksandr Beznosikov

arXiv:2602.13659·cs.LG·February 17, 2026

Zero-Order Optimization for LLM Fine-Tuning via Learnable Direction Sampling

Valery Parfenov, Grigoriy Evseev, Andrey Veprikov, Nikolay Bushkov, Stanislav Moiseev, Aleksandr Beznosikov

PDF

Open Access

TL;DR

This paper introduces a learnable sampling policy for zero-order optimization in large language model fine-tuning, significantly reducing variance and enabling scalable, memory-efficient training.

Contribution

It proposes a novel policy-driven zero-order framework with theoretical analysis and practical algorithms that improve gradient estimates for large-scale NLP models.

Findings

01

Enhanced fine-tuning performance on LLM benchmarks

02

Reduced variance in gradient estimation

03

Relaxed dependence on parameter dimensionality

Abstract

Fine-tuning large pretrained language models (LLMs) is a cornerstone of modern NLP, yet its growing memory demands (driven by backpropagation and large optimizer States) limit deployment in resource-constrained settings. Zero-order (ZO) methods bypass backpropagation by estimating directional derivatives from forward evaluations, offering substantial memory savings. However, classical ZO estimators suffer from high variance and an adverse dependence on the parameter dimensionality $d$ , which has constrained their use to low-dimensional problems. In this work, we propose a policy-driven ZO framework that treats the sampling distribution over perturbation directions as a learnable policy and updates it to reduce the variance of directional estimates. We develop a practical algorithm implementing this idea and provide a theoretical analysis, showing that learned sampling distributions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis