GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks

Sergey Salishev; Ian Akhremchik

arXiv:2508.14004·cs.LG·November 12, 2025

GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks

Sergey Salishev, Ian Akhremchik

PDF

Open Access

TL;DR

This paper introduces GDNSQ, a novel method for low-bit neural network quantization that dynamically learns noise scale and bit-width, achieving high accuracy even at extreme low-bit settings.

Contribution

It proposes a differentiable quantization approach with learnable parameters and a penalty mechanism, enabling effective training of ultra-low-bit neural networks.

Findings

01

Achieves competitive accuracy at W1A1 quantization.

02

Maintains efficiency of Straight-Through Estimator (STE).

03

Effectively models capacity dynamics during quantization.

Abstract

Quantized neural networks can be viewed as a chain of noisy channels, where rounding in each layer reduces capacity as bit-width shrinks; the floating-point (FP) checkpoint sets the maximum input rate. We track capacity dynamics as the average bit-width decreases and identify resulting quantization bottlenecks by casting fine-tuning as a smooth, constrained optimization problem. Our approach employs a fully differentiable Straight-Through Estimator (STE) with learnable bit-width, noise scale and clamp bounds, and enforces a target bit-width via an exterior-point penalty; mild metric smoothing (via distillation) stabilizes training. Despite its simplicity, the method attains competitive accuracy down to the extreme W1A1 setting while retaining the efficiency of STE.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications