TL;DR
This study reveals that aggressive quantization of large language models can induce new biases and stereotypical behaviors unnoticed by standard metrics, highlighting the need for bias-aware compression methods.
Contribution
It provides a comprehensive empirical analysis of bias emergence in quantized LLMs across multiple models, precision levels, and bias benchmarks, revealing hidden fairness risks.
Findings
3-bit quantization causes 6-21% of unbiased items to develop biases
Perplexity metrics fail to detect bias emergence at lower precisions
A significant portion of items develop biases at 4-bit quantization despite minimal perplexity increase
Abstract
Large Language Models are routinely compressed via post-training quantization to reduce inference costs and memory footprint for cloud and edge deployment, yet the impact of this compression on model quality remains poorly understood. Existing studies typically compare only two conditions (full-precision vs. a single quantized variant), rely on aggregate bias metrics, and evaluate a single model family, making it impossible to distinguish gradual degradation from threshold-dependent safety failures. We conduct a controlled empirical study of three instruction-tuned models (Qwen2.5-7B, Mistral-7B, Phi-3.5-mini) at five precision levels (BF16 through 3-bit) on 12,148 BBQ bias benchmark items across 5 random seeds, totaling 911,100 inference records. Our results reveal that 3-bit quantization causes 6-21% of previously unbiased items to develop new stereotypical behaviors, following a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗plawanrath/qwen2.5-7b-instruct-bf16-mlx-cbamodel· 36 dl36 dl
- 🤗plawanrath/qwen2.5-7b-instruct-q8-mlx-cbamodel· 110 dl110 dl
- 🤗plawanrath/qwen2.5-7b-instruct-q6-mlx-cbamodel· 49 dl49 dl
- 🤗plawanrath/qwen2.5-7b-instruct-q4-mlx-cbamodel· 122 dl122 dl
- 🤗plawanrath/qwen2.5-7b-instruct-q3-mlx-cbamodel· 146 dl146 dl
- 🤗plawanrath/mistral-7b-instruct-v0.3-bf16-mlx-cbamodel· 413 dl413 dl
- 🤗plawanrath/mistral-7b-instruct-v0.3-q8-mlx-cbamodel· 372 dl372 dl
- 🤗plawanrath/mistral-7b-instruct-v0.3-q6-mlx-cbamodel· 245 dl245 dl
- 🤗plawanrath/mistral-7b-instruct-v0.3-q4-mlx-cbamodel· 282 dl282 dl
- 🤗plawanrath/mistral-7b-instruct-v0.3-q3-mlx-cbamodel· 417 dl417 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
