Scaling Laws for Precision in High-Dimensional Linear Regression
Dechen Zhang, Xuan Tang, Yingyu Liang, Difan Zou

TL;DR
This paper develops a theoretical framework for understanding how low-precision quantization affects high-dimensional linear regression, revealing different impacts of multiplicative and additive quantization on model and data capacities.
Contribution
It introduces a theoretical analysis of scaling laws for low-precision training, distinguishing the effects of multiplicative and additive quantization on model and data sizes.
Findings
Both quantization schemes introduce additive errors and reduce effective data size.
Multiplicative quantization preserves full-precision model size, while additive reduces effective model size.
Numerical experiments confirm the theoretical predictions.
Abstract
Low-precision training is critical for optimizing the trade-off between model quality and training costs, necessitating the joint allocation of model size, dataset size, and numerical precision. While empirical scaling laws suggest that quantization impacts effective model and data capacities or acts as an additive error, the theoretical mechanisms governing these effects remain largely unexplored. In this work, we initiate a theoretical study of scaling laws for low-precision training within a high-dimensional sketched linear regression framework. By analyzing multiplicative (signal-dependent) and additive (signal-independent) quantization, we identify a critical dichotomy in their scaling behaviors. Our analysis reveals that while both schemes introduce an additive error and degrade the effective data size, they exhibit distinct effects on effective model size: multiplicative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and Data Classification
