AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for   Memory-Efficient Large Language Models Fine-Tuning

Yifan Yang; Kai Zhen; Ershad Banijamal; Athanasios Mouchtaris; Zheng; Zhang

arXiv:2406.18060·cs.CL·December 4, 2024

AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning

Yifan Yang, Kai Zhen, Ershad Banijamal, Athanasios Mouchtaris, Zheng, Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

AdaZeta introduces an adaptive zeroth-order tensor-train approach for memory-efficient large language model fine-tuning, improving performance, convergence, and memory usage over existing methods.

Contribution

The paper proposes AdaZeta, a novel framework with a tensorized adapter and adaptive query schedule to enhance zeroth-order fine-tuning of large language models.

Findings

01

Achieves better accuracy compared to prior zeroth-order methods.

02

Reduces memory consumption during fine-tuning.

03

Ensures convergence with the adaptive query schedule.

Abstract

Fine-tuning large language models (LLMs) has achieved remarkable performance across various natural language processing tasks, yet it demands more and more memory as model sizes keep growing. To address this issue, the recently proposed Memory-efficient Zeroth-order (MeZO) methods attempt to fine-tune LLMs using only forward passes, thereby avoiding the need for a backpropagation graph. However, significant performance drops and a high risk of divergence have limited their widespread adoption. In this paper, we propose the Adaptive Zeroth-order Tensor-Train Adaption (AdaZeta) framework, specifically designed to improve the performance and convergence of the ZO methods. To enhance dimension-dependent ZO estimation accuracy, we introduce a fast-forward, low-parameter tensorized adapter. To tackle the frequently observed divergence issue in large-scale ZO fine-tuning tasks, we propose an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yifanycc/adazeta
pytorchOfficial

Videos

AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning· underline

Taxonomy

TopicsTensor decomposition and applications · Topic Modeling · Computational Physics and Python Applications