BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs

Zhixiong Zhao; Zukang Xu; Dawei Yang

arXiv:2605.00422·cs.LG·May 4, 2026

BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs

Zhixiong Zhao, Zukang Xu, Dawei Yang

PDF

TL;DR

BWLA introduces a novel post-training quantization method for LLMs that combines 1-bit weights with low-bit activations, achieving high accuracy and significant speedup.

Contribution

It is the first framework to enable end-to-end acceleration of LLMs by jointly quantizing weights and activations with novel transformations.

Findings

01

Achieves Wikitext2 perplexity of 11.92 with 6-bit activations on Qwen3-32B.

02

Improves five zero-shot tasks by over 70%.

03

Provides 3.26x inference speedup.

Abstract

Large language models (LLMs) have driven major progress in NLP, yet their substantial memory and compute demands still hinder practical deployment. Binarization can compress weights to 1 bit, fundamentally lowering compute and bandwidth cost. However, existing methods cannot address activation heavy tails and thus must keep activations in high precision, preventing true end-to-end acceleration. To overcome this limitation, we propose BWLA (Binarized Weights and Low-bit Activations), the first post-training quantization framework that preserves high accuracy while achieving 1-bit weight quantization together with low-bit activations (e.g., 6 bits). The Orthogonal-Kronecker Transformation (OKT) learns an orthogonal mapping via EM minimization, converting unimodal weights into symmetric bimodal forms while suppressing activation tails and incoherence. The Proximal SVD Projection (PSP) then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.