AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning
Yifan Yang, Kai Zhen, Ershad Banijamal, Athanasios Mouchtaris, Zheng, Zhang

TL;DR
AdaZeta introduces an adaptive zeroth-order tensor-train approach for memory-efficient large language model fine-tuning, improving performance, convergence, and memory usage over existing methods.
Contribution
The paper proposes AdaZeta, a novel framework with a tensorized adapter and adaptive query schedule to enhance zeroth-order fine-tuning of large language models.
Findings
Achieves better accuracy compared to prior zeroth-order methods.
Reduces memory consumption during fine-tuning.
Ensures convergence with the adaptive query schedule.
Abstract
Fine-tuning large language models (LLMs) has achieved remarkable performance across various natural language processing tasks, yet it demands more and more memory as model sizes keep growing. To address this issue, the recently proposed Memory-efficient Zeroth-order (MeZO) methods attempt to fine-tune LLMs using only forward passes, thereby avoiding the need for a backpropagation graph. However, significant performance drops and a high risk of divergence have limited their widespread adoption. In this paper, we propose the Adaptive Zeroth-order Tensor-Train Adaption (AdaZeta) framework, specifically designed to improve the performance and convergence of the ZO methods. To enhance dimension-dependent ZO estimation accuracy, we introduce a fast-forward, low-parameter tensorized adapter. To tackle the frequently observed divergence issue in large-scale ZO fine-tuning tasks, we propose an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTensor decomposition and applications · Topic Modeling · Computational Physics and Python Applications
