MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition

Seoungsub Lee; In Seo Kim; and Seon Wook Kim

arXiv:2604.04701·cs.LG·April 7, 2026

MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition

Seoungsub Lee, In Seo Kim, and Seon Wook Kim

PDF

TL;DR

MUXQ introduces a novel quantization method that detects and redistributes outlier activation channels in LLMs, enabling low-precision INT quantization with minimal accuracy loss for efficient on-device inference.

Contribution

It proposes a low-rank outlier decomposition technique to improve activation quantization in LLMs, addressing hardware inefficiencies caused by outliers.

Findings

01

MUXQ achieves lower perplexity on GPT-2 models compared to naive quantization.

02

It enables INT8 quantization of activations and weights with accuracy close to FP16.

03

MUXQ maintains stable low-precision inference with modest computational overhead.

Abstract

Large language models (LLMs) have achieved outstanding performance across a wide range of natural language processing tasks, but their enormous parameter counts impose ubstantial memory and computational overheads. This challenge is particularly critical in NPU-based on-device environments, where FP16/FP32 computation is inefficient and integer (INT) quantization is therefore essential. However, existing methods, including ZeroQuant, LLM.int8(), and SmoothQuant, do not fully address input-activation outliers and the associated hardware inefficiencies. To overcome these limitations, we propose MUXQ (Mixed-to-Uniform Quantization). MUXQ detects outlier channels in input activations and introduces a small auxiliary matrix that redistributes outlier magnitudes across channels, thereby alleviating the outlier problem. This enables even activation outliers to be quantized at low-precision INT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.