LCQ: Low-Rank Codebook based Quantization for Large Language Models

Wen-Pu Cai; Ming-Yang Li; Wu-Jun Li

arXiv:2405.20973·cs.LG·February 11, 2025

LCQ: Low-Rank Codebook based Quantization for Large Language Models

Wen-Pu Cai, Ming-Yang Li, Wu-Jun Li

PDF

Open Access

TL;DR

This paper introduces LCQ, a low-rank codebook quantization method for large language models that improves accuracy over traditional rank-one codebook methods while maintaining low storage costs.

Contribution

The paper proposes a novel low-rank codebook quantization technique that enhances model accuracy without significantly increasing storage requirements.

Findings

01

LCQ outperforms existing quantization methods in accuracy.

02

LCQ achieves this with negligible additional storage.

03

Experiments demonstrate improved performance on large language models.

Abstract

Large language models~(LLMs) have recently demonstrated promising performance in many tasks. However, the high storage and computational cost of LLMs has become a challenge for deploying LLMs. Weight quantization has been widely used for model compression, which can reduce both storage and computational cost. Most existing weight quantization methods for LLMs use a rank-one codebook for quantization, which results in substantial accuracy loss when the compression ratio is high. In this paper, we propose a novel weight quantization method, called low-rank codebook based quantization~(LCQ), for LLMs. LCQ adopts a low-rank codebook, the rank of which can be larger than one, for quantization. Experiments show that LCQ can achieve better accuracy than existing methods with a negligibly extra storage cost.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis