Uncertainty Drives Social Bias Changes in Quantized Large Language Models
Stanley Z. Hua, Sanae Lotfi, Irene Y. Chen

TL;DR
This study reveals that quantization of large language models significantly alters their social biases in unpredictable ways, driven by model uncertainty, which challenges the reliability of aggregate bias metrics.
Contribution
It is the first large-scale analysis showing how post-training quantization causes bias flips and asymmetric bias shifts, emphasizing the need for targeted bias evaluation after quantization.
Findings
Quantization induces bias flips in up to 21% of responses.
High-uncertainty responses are 3-11x more likely to change bias states.
4-bit models show 4-6x more behavioral changes than 8-bit models.
Abstract
Post-training quantization reduces the computational cost of large language models but fundamentally alters their social biases in ways that aggregate metrics fail to capture. We present the first large-scale study of 50 quantized models evaluated on PostTrainingBiasBench, a unified benchmark of 13 closed- and open-ended bias datasets. We identify a phenomenon we term quantization-induced masked bias flipping, in which up to 21% of responses flip between biased and unbiased states after quantization, despite showing no change in aggregate bias scores. These flips are strongly driven by model uncertainty, where the responses with high uncertainty are 3-11x more likely to change than the confident ones. Quantization strength amplifies this effect, with 4-bit quantized models exhibiting 4-6x more behavioral changes than 8-bit quantized models. Critically, these changes create asymmetric…
Peer Reviews
Decision·Submitted to ICLR 2026
* The paper is overall well-written and presented, with clear claims of contributions and explanation of methodology/experimental setting. * The background is very well written and appears to cover much of the relevant literature on quantization and bias in LLMs (if not uncertainty/bias as discussed in weaknesses) * The paper is very well motivated with post-quantization of LLM models being pervasive in application and research, although the study of bias in this context is far from a unique mot
* Several of the claimed contributions in terms of conclusions appear far from novel, in particular: - per-group evaluation differs from aggregate evaluation significantly --- this is what motivates all work fairness/bias analysis. - impact on different social groups is asymmetrical --- again this seems to be a consistent finding in other work I think if the authors instead claim to have replicated these findings at a larger/broader scale than existing work, that would be fair, but that's
The main strength of this paper is its extensive study. The authors clearly invested significant time to investigate this phenomenon. They experimented with a large number of datasets, models, and quantization methods, and they evaluated the results using statistical tests. This thoroughness is valuable.
- The core idea, that model compression affects social bias due to uncertainty of the models, is not entirely new. Prior work, like Gonçalves & Strubell (2023), Zakizadeh et al. (2023) and Delobelle and Berendt (2022), has shown similar effects for distillation and other compression methods. This study feels like an extension of this known phenomenon to a wider set of quantization techniques. However, I appreciate that the authors clearly position their work among these related studies. - The a
Comprehensive empirical study and strong execution: The paper conducts a systematic evaluation of how quantization affects bias behavior across multiple LLMs, quantization levels, and bias benchmarks. The experiments are well-organized and presented clearly, allowing readers to trace how quantization alters model outputs in nuanced ways. This level of empirical thoroughness is rare in the fairness–quantization intersection. The paper is well-written, transparent about its setup, and likely repr
Mostly an empirical study and limited novelty: The paper offers a careful empirical analysis but no substantial methodological or theoretical innovation. The link between quantization, calibration drift, and fairness degradation has been noted in prior work, and the “uncertainty drives bias change” framing mainly restates these existing insights. No actionable mitigation or theoretical advancement: The study refines measurement rather than introducing new modeling ideas or actionable mitigat
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Explainable Artificial Intelligence (XAI)
