Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang

TL;DR
This paper introduces Outlier Channel Splitting (OCS), a novel method for quantizing neural networks without retraining by addressing outliers through channel duplication and splitting, improving quantization accuracy on standard hardware.
Contribution
The paper proposes OCS, a training-free outlier handling technique that enhances neural network quantization performance on commodity hardware.
Findings
OCS outperforms existing clipping methods on ImageNet classification.
OCS achieves comparable or better results with minimal overhead.
Method works effectively on language modeling tasks.
Abstract
Quantization can improve the execution latency and energy efficiency of neural networks on both commodity GPUs and specialized accelerators. The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training. DNN weights and activations follow a bell-shaped distribution post-training, while practical hardware uses a linear quantization grid. This leads to challenges in dealing with outliers in the distribution. Prior work has addressed this by clipping the outliers or using specialized hardware. In this work, we propose outlier channel splitting (OCS), which duplicates channels containing outliers, then halves the channel values. The network remains functionally identical, but affected outliers are moved toward the center of the distribution. OCS requires no additional training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
