CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs

Zhaojing Zhou; Xunchao Li; Minghao Li; Handi Zhang; Haoshuang Wang; Wenbin Chang; Yiqun Liu; Qingqing Dang; Dianhai Yu; Yanjun Ma; Haifeng Wang

arXiv:2507.07145·cs.LG·July 11, 2025

CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs

Zhaojing Zhou, Xunchao Li, Minghao Li, Handi Zhang, Haoshuang Wang, Wenbin Chang, Yiqun Liu, Qingqing Dang, Dianhai Yu, Yanjun Ma, Haifeng Wang

PDF

Open Access 1 Models

TL;DR

This paper introduces CCQ, a novel low-bit quantization method for LLMs that compresses models to 2-3 bits with minimal accuracy loss, enabling efficient deployment on single GPUs.

Contribution

CCQ presents a hardware-aware, lookup-free quantization approach that overcomes accuracy and speed bottlenecks in extreme low-bit LLM compression.

Findings

01

Achieves 2-3 bit compression with minimal accuracy loss

02

Enables single-GPU deployment of large models

03

Open-sourced the 2-bit ERNIE-4.5 model

Abstract

The rapid scaling of Large Language Models (LLMs) elevates inference costs and compounds substantial deployment barriers. While quantization to 8 or 4 bits mitigates this, sub-3-bit methods face severe accuracy, scalability, and efficiency degradation. We propose Convolutional Code Quantization (CCQ), an inference-optimized quantization approach compressing LLMs to 2.0-2.75 bits with minimal accuracy loss. Departing from error-prone scalar quantization or slow vector quantization, CCQ integrates a hardware-aware bit-shift encoding and decoding solution with Convolutional Code, Hybrid Encoding, and Code Cluster, jointly overcoming accuracy-speed bottlenecks. We construct a lookup-free encoding space, enabling a linear mapping between the codebook and weight vectors, thereby optimizing inference performance. Meanwhile, by drawing on the concept of data mapping from vector quantization, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
turboderp/ERNIE-4.5-300B-A47B-PT-exl3
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Natural Language Processing Techniques · Speech Recognition and Synthesis