QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models
Jiajun Zhou, Yifan Yang, Kai Zhen, Ziyue Liu, Yequan Zhao, Ershad, Banijamali, Athanasios Mouchtaris, Ngai Wong, Zheng Zhang

TL;DR
QuZO introduces a low-precision, zeroth-order fine-tuning framework for large language models that avoids error-prone backpropagation and achieves high accuracy with reduced memory usage.
Contribution
The paper proposes QuZO, a novel zeroth-order fine-tuning method for quantized LLMs that improves training stability and efficiency in low-precision settings.
Findings
Achieves comparable performance to first-order methods in FP8.
Outperforms in INT8 and INT4 training accuracy.
Reduces memory cost by 2.94 times in LLaMA2-7B fine-tuning.
Abstract
Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference. However, quantization often degrades model performance, thus fine-tuning is required for various down-stream tasks. Traditional fine-tuning methods such as stochastic gradient descent and Adam optimization require backpropagation, which are error-prone in the low-precision settings. To overcome these limitations, we propose the Quantized Zeroth-Order (QuZO) framework, specifically designed for fine-tuning LLMs through low-precision (e.g., 4- or 8-bit) forward passes. Our method can avoid the error-prone low-precision straight-through estimator, and utilizes optimized stochastic rounding to mitigate the increased bias. QuZO simplifies the training process, while achieving results comparable to first-order methods in and superior accuracy in and ${\rm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis
