Beyond Discreteness: Finite-Sample Analysis of Straight-Through Estimator for Quantization
Halyun Jeong, Jack Xin, Penghang Yin

TL;DR
This paper provides the first finite-sample theoretical analysis of the straight-through estimator (STE) for neural network quantization, revealing the importance of sample size and data complexity in ensuring successful training convergence.
Contribution
It introduces a finite-sample analysis of STE, deriving sample complexity bounds and exploring its behavior under label noise in neural network quantization.
Findings
Sample size critically affects STE success.
Derived data-dependent convergence guarantees.
Observed recurrence behavior of STE under label noise.
Abstract
Training quantized neural networks requires addressing the non-differentiable and discrete nature of the underlying optimization problem. To tackle this challenge, the straight-through estimator (STE) has become the most widely adopted heuristic, allowing backpropagation through discrete operations by introducing surrogate gradients. However, its theoretical properties remain largely unexplored, with few existing works simplifying the analysis by assuming an infinite amount of training data. In contrast, this work presents the first finite-sample analysis of STE in the context of neural network quantization. Our theoretical results highlight the critical role of sample size in the success of STE, a key insight absent from existing studies. Specifically, by analyzing the quantization-aware training of a two-layer neural network with binary weights and activations, we derive the sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
