Effect of Weight Quantization on Learning Models by Typical Case Analysis
Shuhei Kashiwamura, Ayaka Sakata, Masaaki Imaizumi

TL;DR
This paper uses statistical physics techniques to analyze how weight quantization hyperparameters affect learning models, revealing phase transitions, optimal settings, and benefits for overfitting mitigation, with validation via an approximate message-passing algorithm.
Contribution
It introduces a novel application of the replica method to analyze weight quantization effects and proposes an approximate message-passing algorithm for validation.
Findings
Unstable hyperparameter phase with few bits and large width.
Existence of an optimal quantization width minimizing error.
Quantization delays overparameterization and mitigates overfitting.
Abstract
This paper examines the quantization methods used in large-scale data analysis models and their hyperparameter choices. The recent surge in data analysis scale has significantly increased computational resource requirements. To address this, quantizing model weights has become a prevalent practice in data analysis applications such as deep learning. Quantization is particularly vital for deploying large models on devices with limited computational resources. However, the selection of quantization hyperparameters, like the number of bits and value range for weight quantization, remains an underexplored area. In this study, we employ the typical case analysis from statistical physics, specifically the replica method, to explore the impact of hyperparameters on the quantization of simple learning models. Our analysis yields three key findings: (i) an unstable hyperparameter phase, known as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Big Data and Digital Economy
