When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence

Marcus Armstrong

arXiv:2604.15167·cs.LG·April 17, 2026

When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence

Marcus Armstrong

PDF

TL;DR

This paper reveals that INT4 post-training quantization can fail after FP32 convergence, with a three-phase divergence pattern, and proposes a learning rate schedule to mitigate this issue, supported by extensive empirical analysis.

Contribution

It characterizes the divergence behavior of INT4 quantization post-FP32 convergence and introduces a schedule that reduces divergence, supported by analysis of 154 checkpoints.

Findings

01

INT4 quantization divergence begins after FP32 convergence

02

INT8 quantization remains stable throughout all phases

03

Oscillatory Lock-In schedule reduces INT4 divergence by 2.2 percentage points

Abstract

Post-training quantization (PTQ) assumes that a well-converged model is a quantization-ready model. We show this assumption fails in a structured, measurable, and previously uncharacterized way. Using a calibration-free per-group INT4 probe applied to all 154 publicly available Pythia-160m training checkpoints, we identify a three-phase divergence structure: a rapid-learning phase where both FP32 perplexity and quantization robustness improve together, a meta-stable plateau lasting roughly 70,000 steps where FP32 perplexity stagnates but INT4 gap remains bounded, and an explosive divergence phase where the INT4 gap compounds from 11% to 517% while FP32 perplexity barely moves. Critically, this divergence begins not when the learning rate starts decaying, but precisely when FP32 perplexity converges a finer-grained onset predictor that implies post-convergence weight updates, rather than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.