QTIP: Quantization with Trellises and Incoherence Processing

Albert Tseng; Qingyao Sun; David Hou; Christopher De Sa

arXiv:2406.11235·cs.LG·June 19, 2025·3 cites

QTIP: Quantization with Trellises and Incoherence Processing

Albert Tseng, Qingyao Sun, David Hou, Christopher De Sa

PDF

Open Access 2 Repos 1 Video

TL;DR

QTIP leverages trellis coded quantization to enable ultra-high-dimensional weight quantization in LLMs, significantly improving inference efficiency and quantization quality over traditional vector quantization methods.

Contribution

The paper introduces QTIP, a novel quantization method using trellis coded quantization for high-dimensional weight compression in LLMs, overcoming limitations of vector quantization.

Findings

01

QTIP achieves state-of-the-art quantization quality.

02

QTIP improves inference speed in LLMs.

03

QTIP enables ultra-high-dimensional quantization.

Abstract

Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing weights to low-precision datatypes. Since LLM inference is usually memory-bound, PTQ methods can improve inference throughput. Recent state-of-the-art PTQ approaches use vector quantization (VQ) to quantize multiple weights at once, which improves information utilization through better shaping. However, VQ requires a codebook with size exponential in the dimension. This limits current VQ-based PTQ works to low VQ dimensions ( $\leq 8$ ) that in turn limit quantization quality. Here, we introduce QTIP, which instead uses trellis coded quantization (TCQ) to achieve ultra-high-dimensional quantization. TCQ uses a stateful decoder that separates the codebook size from the bitrate and effective dimension. QTIP introduces a spectrum of lookup-only to computed lookup-free trellis codes designed for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

QTIP: Quantization with Trellises and Incoherence Processing· slideslive

Taxonomy

TopicsNeural Networks and Applications