SplitQuant: Layer Splitting for Low-Bit Neural Network Quantization

Jaewoo Song; Fangzhen Lin

arXiv:2501.12428·cs.LG·February 7, 2025

SplitQuant: Layer Splitting for Low-Bit Neural Network Quantization

Jaewoo Song, Fangzhen Lin

PDF

Open Access

TL;DR

SplitQuant is a novel layer splitting method that improves low-bit neural network quantization by better handling outliers, leading to higher accuracy in quantized models like BERT-Tiny.

Contribution

The paper introduces SplitQuant, a new layer splitting technique that preserves outliers and enhances quantization resolution for low-bit neural networks.

Findings

01

Improved INT2 quantization accuracy by up to 3.3 percentage points.

02

Achieved quantized model accuracy comparable to FP32 models.

03

Effective on BERT-Tiny models with minimal accuracy loss.

Abstract

Quantization for deep neural networks (DNNs) is the process of mapping the parameter values of DNNs from original data types to other data types of lower precision to reduce model sizes and make inference faster. Quantization often maps different original values to a single quantized value because the range of the original values is larger than the range of the quantized values. This leads to the degradation of the accuracy of the quantized DNNs. Outliers are a main cause of the degradation of quantization resolution because they enlarge the range of original values. To solve the problem, the percentile method is often used to clip outliers. However, clipping the outliers has another problem of removing the important and strong signals in the DNNs. This paper proposes SplitQuant to keep the outliers and improve the quantization resolution at the same time. SplitQuant narrows down the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Neural Networks and Applications

MethodsContrastive Language-Image Pre-training