Scheduling Weight Transitions for Quantization-Aware Training

Junghyup Lee; Jeimin Jeon; Dohyung Kim; Bumsub Ham

arXiv:2404.19248·cs.CV·October 2, 2025

Scheduling Weight Transitions for Quantization-Aware Training

Junghyup Lee, Jeimin Jeon, Dohyung Kim, Bumsub Ham

PDF

Open Access

TL;DR

This paper introduces a transition rate scheduling method for quantization-aware training that explicitly controls weight transitions, improving training effectiveness by decoupling from traditional learning rate schedules.

Contribution

It proposes a novel transition rate scheduling technique and a transition-adaptive learning rate to better manage quantized weight changes during training.

Findings

01

Improved quantization accuracy on standard benchmarks.

02

Effective control of weight transitions enhances training stability.

03

Decoupling LR from weight transitions benefits QAT performance.

Abstract

Quantization-aware training (QAT) simulates a quantization process during training to lower bit-precision of weights/activations. It learns quantized weights indirectly by updating latent weights,i.e., full-precision inputs to a quantizer, using gradient-based optimizers. We claim that coupling a user-defined learning rate (LR) with these optimizers is sub-optimal for QAT. Quantized weights transit discrete levels of a quantizer, only if corresponding latent weights pass transition points, where the quantizer changes discrete states. This suggests that the changes of quantized weights are affected by both the LR for latent weights and their distributions. It is thus difficult to control the degree of changes for quantized weights by scheduling the LR manually. We conjecture that the degree of parameter changes in QAT is related to the number of quantized weights transiting discrete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems