When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence
Marcus Armstrong

TL;DR
This paper reveals that INT4 post-training quantization can fail after FP32 convergence, with a three-phase divergence pattern, and proposes a learning rate schedule to mitigate this issue, supported by extensive empirical analysis.
Contribution
It characterizes the divergence behavior of INT4 quantization post-FP32 convergence and introduces a schedule that reduces divergence, supported by analysis of 154 checkpoints.
Findings
INT4 quantization divergence begins after FP32 convergence
INT8 quantization remains stable throughout all phases
Oscillatory Lock-In schedule reduces INT4 divergence by 2.2 percentage points
Abstract
Post-training quantization (PTQ) assumes that a well-converged model is a quantization-ready model. We show this assumption fails in a structured, measurable, and previously uncharacterized way. Using a calibration-free per-group INT4 probe applied to all 154 publicly available Pythia-160m training checkpoints, we identify a three-phase divergence structure: a rapid-learning phase where both FP32 perplexity and quantization robustness improve together, a meta-stable plateau lasting roughly 70,000 steps where FP32 perplexity stagnates but INT4 gap remains bounded, and an explosive divergence phase where the INT4 gap compounds from 11% to 517% while FP32 perplexity barely moves. Critically, this divergence begins not when the learning rate starts decaying, but precisely when FP32 perplexity converges a finer-grained onset predictor that implies post-convergence weight updates, rather than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
