ADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language Models

Ryan Lucas; Mehdi Makni; Xiang Meng; Adam Deng; Rahul Mazumder

arXiv:2605.11222·cs.LG·May 13, 2026

ADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language Models

Ryan Lucas, Mehdi Makni, Xiang Meng, Adam Deng, Rahul Mazumder

PDF

1 Repo

TL;DR

ADMM-Q is a novel layer-wise weight quantization algorithm for large language models that improves utility at aggressive quantization levels through an ADMM-based approach with convergence guarantees.

Contribution

It introduces ADMM-Q, a modular, layer-wise weight quantization method based on ADMM, enhancing post-training quantization of LLMs with improved accuracy and efficiency.

Findings

01

Reduces WikiText-2 perplexity on Qwen3-8B models in various quantization settings.

02

Decreases perplexity from 12.85 to 10.06 in weight-only setting.

03

Achieves better perplexity scores in SmoothQuant and SpinQuant procedures.

Abstract

Quantization is an effective strategy to reduce the storage and computation footprint of large language models (LLMs). Post-training quantization (PTQ) is a leading approach for compressing LLMs. Popular weight quantization procedures, including GPTQ and RTN, suffer in model utility, especially at aggressive quantization levels (sub-4-bit). We propose ADMM-Q, a novel weight quantization algorithm that considers the layer-wise quantization problem. Our algorithm is based on a combinatorial variant of the Alternating Direction Method of Multipliers (ADMM). Our operator-splitting procedure updates weights continuously to minimize the layer-wise reconstruction error, while gradually enforcing the quantization constraints with convergence guarantees. We propose additional algorithmic enhancements (e.g., penalty scheduling, preconditioning, and a local search post-processing step) to make…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ufere/Assingment_1
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.