TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
Yuhang Li, Priyadarshini Panda

TL;DR
TesseraQ is a novel post-training quantization method that significantly reduces the memory footprint of large language models by optimizing weight rounding and scale parameters, achieving state-of-the-art results with ultra-low-bit quantization.
Contribution
It introduces progressive adaptive rounding and block reconstruction techniques to enhance ultra-low-bit quantization of LLMs, surpassing existing methods like AWQ and OmniQuant.
Findings
Improves Wikitext2 perplexity from 14.65 to 6.82 with 2-bit quantization.
Increases downstream accuracy from 50.52 to 59.27 with 2-bit quantization.
Consistently outperforms other quantization schemes across various settings.
Abstract
Large language models (LLMs) have revolutionized natural language processing, albeit at the cost of immense memory and computation requirements. Post-training quantization (PTQ) is becoming the de facto method to reduce the memory footprint and improve the inference throughput of LLMs. In this work, we aim to push the upper limit of LLM PTQ by optimizing the weight rounding parameters with the block reconstruction technique, a predominant method in previous vision models. We propose TesseraQ, a new state-of-the-art PTQ technique, to quantize the weights of LLMs to ultra-low bits. To effectively optimize the rounding in LLMs and stabilize the reconstruction process, we introduce progressive adaptive rounding. This approach iteratively transits the soft rounding variables to hard variables during the reconstruction process. Additionally, we optimize the dequantization scale parameters to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Image Processing Techniques and Applications · Advancements in Photolithography Techniques
