Scalar Quantization as Sparse Least Square Optimization
Chen Wang, Xiaomei Yang, Shaomin Fei, Kai Zhou, Xiaofeng Gong, Miao, Du, Ruisen Luo

TL;DR
This paper introduces novel scalar quantization algorithms based on sparse least square optimization, addressing limitations of clustering methods and improving efficiency and accuracy in neural network quantization.
Contribution
It proposes new quantization algorithms using $l_1$, $l_1 + l_2$, and $l_0$ regularizations, and establishes a connection to improved k-means clustering, offering a new perspective on quantization.
Findings
Algorithms outperform existing methods in certain bit-width reduction scenarios.
The proposed methods are computationally efficient and maintain low information loss.
The clustering-based scheme is mathematically equivalent to an improved k-means algorithm.
Abstract
Quantization can be used to form new vectors/matrices with shared values close to the original. In recent years, the popularity of scalar quantization for value-sharing applications has been soaring as it has been found huge utilities in reducing the complexity of neural networks. Existing clustering-based quantization techniques, while being well-developed, have multiple drawbacks including the dependency of the random seed, empty or out-of-the-range clusters, and high time complexity for a large number of clusters. To overcome these problems, in this paper, the problem of scalar quantization is examined from a new perspective, namely sparse least square optimization. Specifically, inspired by the property of sparse least square regression, several quantization algorithms based on least square are proposed. In addition, similar schemes with and regularization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
