QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models

Jiajun Zhou; Yifan Yang; Kai Zhen; Ziyue Liu; Yequan Zhao; Ershad; Banijamali; Athanasios Mouchtaris; Ngai Wong; Zheng Zhang

arXiv:2502.12346·cs.LG·February 19, 2025

QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models

Jiajun Zhou, Yifan Yang, Kai Zhen, Ziyue Liu, Yequan Zhao, Ershad, Banijamali, Athanasios Mouchtaris, Ngai Wong, Zheng Zhang

PDF

Open Access 1 Video

TL;DR

QuZO introduces a low-precision, zeroth-order fine-tuning framework for large language models that avoids error-prone backpropagation and achieves high accuracy with reduced memory usage.

Contribution

The paper proposes QuZO, a novel zeroth-order fine-tuning method for quantized LLMs that improves training stability and efficiency in low-precision settings.

Findings

01

Achieves comparable performance to first-order methods in FP8.

02

Outperforms in INT8 and INT4 training accuracy.

03

Reduces memory cost by 2.94 times in LLaMA2-7B fine-tuning.

Abstract

Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference. However, quantization often degrades model performance, thus fine-tuning is required for various down-stream tasks. Traditional fine-tuning methods such as stochastic gradient descent and Adam optimization require backpropagation, which are error-prone in the low-precision settings. To overcome these limitations, we propose the Quantized Zeroth-Order (QuZO) framework, specifically designed for fine-tuning LLMs through low-precision (e.g., 4- or 8-bit) forward passes. Our method can avoid the error-prone low-precision straight-through estimator, and utilizes optimized stochastic rounding to mitigate the increased bias. QuZO simplifies the training process, while achieving results comparable to first-order methods in $FP 8$ and superior accuracy in $INT 8$ and ${\rm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis