OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension

Zhiyuan Zhang; Yanzhao Li; Zhiqiang Zou; Bai Du; Yupeng Sun; Hui Dong; Hui Wang

arXiv:2604.12782·cs.LG·April 15, 2026

OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension

Zhiyuan Zhang, Yanzhao Li, Zhiqiang Zou, Bai Du, Yupeng Sun, Hui Dong, Hui Wang

PDF

TL;DR

This paper introduces OSC, a hardware-efficient 4-bit quantization method for large language models that suppresses activation outliers through structured channel separation, improving accuracy and speed.

Contribution

OSC presents a novel outlier suppression framework that combines dual-path computation and structured channel coalescence for efficient low-bit model deployment.

Findings

01

Achieves only 2.19 and 1.12 point accuracy drops on Qwen models.

02

Realizes up to 1.78x speedup over W8A8 baseline.

03

Effectively suppresses activation outliers in 4-bit quantization.

Abstract

While 4-bit quantization is essential for high-throughput deployment of Large Language Models, activation outliers often lead to significant accuracy degradation due to the restricted dynamic range of low-bit formats. In this paper, we systematically investigate the spatial distribution of outliers and demonstrate a token-persistent structural clustering effect, where high-magnitude outliers consistently occupy fixed channels across tokens. Building on this insight, we propose OSC, a hardware-efficient framework for outlier suppression. During inference, OSC executes a dual-path computation consisting of a low-precision 4-bit General Matrix Multiplication (GEMM) path and a high-precision 16-bit branch GEMM path. Specifically, OSC uses an offline group-wise strategy to identify the channels where outliers are located and then performs structured sub-tensor extraction to coalesce these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.