CRVQ: Channel-Relaxed Vector Quantization for Extreme Compression of   LLMs

Yuzhuang Xu; Shiyu Ji; Qingfu Zhu; Wanxiang Che

arXiv:2412.09282·cs.LG·February 20, 2025

CRVQ: Channel-Relaxed Vector Quantization for Extreme Compression of LLMs

Yuzhuang Xu, Shiyu Ji, Qingfu Zhu, Wanxiang Che

PDF

Open Access 1 Video

TL;DR

CRVQ is a novel quantization technique that significantly improves extreme compression of large language models by selectively relaxing critical channels, achieving near lossless 1-bit compression with minimal additional bits.

Contribution

This paper introduces CRVQ, a new channel-relaxed vector quantization method that enhances post-training quantization performance for LLMs at very low bit-widths.

Findings

01

38.9% improvement over current sub-2-bit PTQ baseline

02

Enables near lossless 1-bit compression

03

Offers flexible bit-width customization

Abstract

Powerful large language models (LLMs) are increasingly expected to be deployed with lower computational costs, enabling their capabilities on resource-constrained devices. Post-training quantization (PTQ) has emerged as a star approach to achieve this ambition, with best methods compressing weights to less than 2 bit on average. In this paper, we propose Channel-Relaxed Vector Quantization (CRVQ), a novel technique that significantly improves the performance of PTQ baselines at the cost of only minimal additional bits. This state-of-the-art extreme compression method achieves its results through two key innovations: (1) carefully selecting and reordering a very small subset of critical weight channels, and (2) leveraging extended codebooks to relax the constraint of critical channels. With our method, we demonstrate a 38.9\% improvement over the current strongest sub-2-bit PTQ baseline,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CRVQ: Channel-Relaxed Vector Quantization for Extreme Compression of LLMs· underline

Taxonomy

TopicsAdvanced MRI Techniques and Applications