WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

Jiale Chen; Vage Egiazarian; Roberto L. Castro; Torsten Hoefler; Dan Alistarh

arXiv:2512.00956·cs.LG·February 3, 2026

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

Jiale Chen, Vage Egiazarian, Roberto L. Castro, Torsten Hoefler, Dan Alistarh

PDF

Open Access

TL;DR

WUSH introduces a data-dependent, near-optimal linear transform for quantizing large language models, significantly improving accuracy and efficiency over previous fixed transforms.

Contribution

It derives a closed-form, data-dependent transform that is provably near-optimal for LLM weight and activation quantization, combining theoretical optimality with practical GPU efficiency.

Findings

01

WUSH improves W4A4 accuracy by up to 2.8 points over Hadamard baselines.

02

WUSH achieves up to 6.6× throughput per layer compared to BF16.

03

Empirical results demonstrate WUSH's effectiveness across different quantization formats.

Abstract

Quantizing LLM weights and activations is a standard approach for efficient deployment, but a few extreme outliers can stretch the dynamic range and amplify low-bit quantization errors. Prior transform-based mitigations (e.g., Hadamard rotations) are fixed and data-agnostic, and their optimality for quantization has remained unclear. We derive closed-form optimal linear blockwise transforms for joint weight-activation quantization under standard RTN AbsMax-scaled block quantizers, covering both integer and floating-point formats. The resulting construction, WUSH, combines a Hadamard backbone with a data-dependent second-moment component to form a non-orthogonal transform that is provably near-optimal for FP and INT quantizers under mild assumptions while admitting an efficient fused GPU implementation. Empirically, WUSH improves W4A4 accuracy over the strongest Hadamard-based baselines…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Parallel Computing and Optimization Techniques · Digital Filter Design and Implementation