Prior-Informed Zeroth-Order Optimization with Adaptive Direction Alignment for Memory-Efficient LLM Fine-Tuning

Feihu Jin; Shipeng Cen; and Ying Tan

arXiv:2601.04710·cs.CL·January 9, 2026

Prior-Informed Zeroth-Order Optimization with Adaptive Direction Alignment for Memory-Efficient LLM Fine-Tuning

Feihu Jin, Shipeng Cen, and Ying Tan

PDF

Open Access

TL;DR

This paper introduces a prior-informed zeroth-order optimization method with adaptive direction alignment that significantly improves memory efficiency and convergence speed in fine-tuning large language models, outperforming traditional approaches.

Contribution

The paper presents a novel prior-informed perturbation technique for zeroth-order optimization that enhances gradient estimation accuracy and convergence in large language model fine-tuning.

Findings

01

Outperforms traditional ZO methods on OPT-13B across 11 benchmarks.

02

Achieves better results than gradient-based methods on 9 out of 11 tasks.

03

Demonstrates faster convergence and improved efficiency in large-scale LLM fine-tuning.

Abstract

Fine-tuning large language models (LLMs) has achieved remarkable success across various NLP tasks, but the substantial memory overhead during backpropagation remains a critical bottleneck, especially as model scales grow. Zeroth-order (ZO) optimization alleviates this issue by estimating gradients through forward passes and Gaussian sampling, avoiding the need for backpropagation. However, conventional ZO methods suffer from high variance in gradient estimation due to their reliance on random perturbations, leading to slow convergence and suboptimal performance. We propose a simple plug-and-play method that incorporates prior-informed perturbations to refine gradient estimation. Our method dynamically computes a guiding vector from Gaussian samples, which directs perturbations toward more informative directions, significantly accelerating convergence compared to standard ZO approaches.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis