ApiQ: Finetuning of 2-Bit Quantized Large Language Model

Baohao Liao; Christian Herold; Shahram Khadivi; Christof Monz

arXiv:2402.05147·cs.LG·June 24, 2024·1 cites

ApiQ: Finetuning of 2-Bit Quantized Large Language Model

Baohao Liao, Christian Herold, Shahram Khadivi, Christof Monz

PDF

Open Access 1 Repo

TL;DR

ApiQ introduces a novel quantization framework that preserves model knowledge during low-bit finetuning of large language models, leading to improved performance across diverse tasks and bit-widths.

Contribution

The paper presents ApiQ, a new quantization method that initializes LoRA components and quantizes weights simultaneously to reduce information loss during low-bit LLM finetuning.

Findings

01

ApiQ minimizes activation error during quantization.

02

ApiQ achieves superior finetuning results across various bit-widths.

03

ApiQ maintains activation precision while reducing error propagation.

Abstract

Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the effectiveness of these methods compared to full finetuning. Despite the advancements, current strategies for memory-efficient finetuning, such as QLoRA, exhibit inconsistent performance across diverse bit-width quantizations and multifaceted tasks. This inconsistency largely stems from the detrimental impact of the quantization process on preserved knowledge, leading to catastrophic forgetting and undermining the utilization of pretrained models for finetuning purposes. In this work, we introduce a novel quantization framework, ApiQ, designed to restore the lost information from quantization by concurrently initializing the LoRA components and quantizing the weights of LLMs. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baohaoliao/apiq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Topic Modeling · Speech Recognition and Synthesis