Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection
Feihu Jin, Ying Tan

TL;DR
Hi-ZFO introduces a hierarchical hybrid optimization method combining zeroth- and first-order techniques, adaptively applying them to different model layers to improve fine-tuning efficiency and generalization of large language models.
Contribution
This paper proposes Hi-ZFO, a novel hybrid framework that adaptively combines zeroth- and first-order optimization for LLM fine-tuning, enhancing performance and reducing training time.
Findings
Achieves superior performance across diverse tasks.
Reduces training time significantly.
Effectively escapes local minima during training.
Abstract
Fine-tuning large language models (LLMs) using standard first-order (FO) optimization often drives training toward sharp, poorly generalizing minima. Conversely, zeroth-order (ZO) methods offer stronger exploratory behavior without relying on explicit gradients, yet suffer from slow convergence. More critically, our analysis reveals that in generative tasks, the vast output and search space significantly amplify estimation variance, rendering ZO methods both noisy and inefficient. To address these challenges, we propose \textbf{Hi-ZFO} (\textbf{Hi}erarchical \textbf{Z}eroth- and \textbf{F}irst-\textbf{O}rder optimization), a hybrid framework designed to synergize the precision of FO gradients with the exploratory capability of ZO estimation. Hi-ZFO adaptively partitions the model through layer-wise importance profiling, applying precise FO updates to critical layers while leveraging ZO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Generative Adversarial Networks and Image Synthesis
