DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization

Yeonhong Park; Jake Hyun; Hojoon Kim; Jae W. Lee

arXiv:2412.20185·cs.LG·June 25, 2025

DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization

Yeonhong Park, Jake Hyun, Hojoon Kim, Jae W. Lee

PDF

Open Access

TL;DR

DecDEC is a novel system that enhances low-bit quantized large language models by selectively correcting salient channels using residuals, significantly improving model quality with minimal additional memory and latency.

Contribution

DecDEC introduces a dynamic residual fetching method that improves low-bit LLM quantization accuracy while maintaining efficiency.

Findings

01

Reduces perplexity of 3-bit Llama-3-8B-Instruct from 10.15 to 9.12

02

Adds less than 0.0003% GPU memory overhead

03

Increases inference latency by only 1.7% on NVIDIA RTX 4050 Mobile

Abstract

Quantization of Large Language Models (LLMs) has recently gained popularity, particularly for on-device settings with limited hardware resources. While efficient, quantization inevitably degrades model quality, especially in aggressive low-bit settings such as 3-bit and 4-bit precision. In this paper, we propose DecDEC, an inference scheme that improves the quality of low-bit LLMs while preserving the key benefits of quantization: GPU memory savings and latency reduction. DecDEC stores the residual matrix -- the difference between full-precision and quantized weights -- in CPU, and dynamically fetches the residuals for only a small portion of the weights. This portion corresponds to the salient channels, marked by activation outliers, with the fetched residuals helping to correct quantization errors in these channels. Salient channels are identified dynamically at each decoding step by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVLSI and Analog Circuit Testing · Advancements in Photolithography Techniques · Advancements in Semiconductor Devices and Circuit Design