ConQuR: Corner Aligned Activation Quantization via Optimized Rotations for LLMs
Chayne Thrash, Ali Abbasi, Soheil Kolouri

TL;DR
This paper introduces a lightweight, post-training rotation calibration method for quantizing activations in large language models, improving efficiency and performance without extensive training or storage overhead.
Contribution
It proposes a novel orthogonal rotation calibration technique that aligns activations with hypercube corners using a closed-form solution, enabling efficient online calibration.
Findings
Achieves competitive perplexity and reasoning performance on Llama models.
Avoids costly end-to-end training and large offline activation storage.
Provides an efficient closed-form update for rotation calibration.
Abstract
Large language models (LLMs) are costly to deploy due to their large memory footprint and high inference cost. Weight-activation quantization can reduce these costs, but low-bit activation quantization remains difficult because activation outliers induce large quantization error. Recent rotation-based methods address this by applying orthogonal transformations that redistribute activation magnitude across dimensions, but existing approaches either require expensive end-to-end rotation training or rely on stored activation corpora, introducing significant compute or storage overhead. We propose a lightweight post-training rotation calibration method for LLM activation quantization. Our method learns orthogonal rotations that align normalized activations with the corners of an inscribed hypercube, encouraging activation energy to be distributed more evenly across dimensions. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
