Preserving Fairness and Safety in Quantized LLMs Through Critical Weight Protection
Muhammad Alif Al Hakim, Alfan Farizki Wicaksono, Fajri Koto

TL;DR
This paper investigates how quantization affects fairness and safety in large language models across multiple languages and introduces Critical Weight Protection to mitigate these issues effectively.
Contribution
It systematically studies the impact of static and dynamic quantization on fairness and safety, and proposes a novel method to preserve critical weights during quantization.
Findings
Quantization degrades fairness and safety, especially in non-English languages.
Dynamic quantization methods are more stable than static ones.
Critical Weight Protection mitigates bias and safety issues without retraining.
Abstract
Quantization is widely adopted to reduce the computational cost of large language models (LLMs); however, its implications for fairness and safety, particularly in dynamic quantization and multilingual contexts, remain underexplored. In this work, we conduct a systematic study of how static and dynamic quantization methods impact fairness and safety across benchmarks measuring intrinsic and extrinsic bias and safety alignment. For fairness, we evaluate English, French, Dutch, Spanish, and Turkish; for safety, we focus on English, Korean, and Arabic. Our findings reveal that quantization consistently degrades fairness and safety, with dynamic methods demonstrating greater stability than static ones. Moreover, fairness degradation varies across languages, while safety deterioration is especially pronounced in non-English settings. To address these risks, we introduce Critical Weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
