Improving Neural Network Quantization without Retraining using Outlier   Channel Splitting

Ritchie Zhao; Yuwei Hu; Jordan Dotzel; Christopher De Sa; Zhiru Zhang

arXiv:1901.09504·cs.LG·May 24, 2019·151 cites

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting

Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang

PDF

Open Access 3 Repos

TL;DR

This paper introduces Outlier Channel Splitting (OCS), a novel method for quantizing neural networks without retraining by addressing outliers through channel duplication and splitting, improving quantization accuracy on standard hardware.

Contribution

The paper proposes OCS, a training-free outlier handling technique that enhances neural network quantization performance on commodity hardware.

Findings

01

OCS outperforms existing clipping methods on ImageNet classification.

02

OCS achieves comparable or better results with minimal overhead.

03

Method works effectively on language modeling tasks.

Abstract

Quantization can improve the execution latency and energy efficiency of neural networks on both commodity GPUs and specialized accelerators. The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training. DNN weights and activations follow a bell-shaped distribution post-training, while practical hardware uses a linear quantization grid. This leads to challenges in dealing with outliers in the distribution. Prior work has addressed this by clipping the outliers or using specialized hardware. In this work, we propose outlier channel splitting (OCS), which duplicates channels containing outliers, then halves the channel values. The network remains functionally identical, but affected outliers are moved toward the center of the distribution. OCS requires no additional training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning