Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models

Artyom Kharinaev; Viktor Moskvoretskii; Egor Shvetsov; Kseniia Studenikina; Bykov Mikhail; Evgeny Burnaev

arXiv:2502.15799·cs.CR·July 1, 2025

Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models

Artyom Kharinaev, Viktor Moskvoretskii, Egor Shvetsov, Kseniia Studenikina, Bykov Mikhail, Evgeny Burnaev

PDF

Open Access 1 Repo

TL;DR

This paper evaluates how different quantization methods affect the safety and reliability of large language models, revealing trade-offs and the need for safety-aware compression strategies.

Contribution

It introduces OpenMiniSafety, a new safety dataset, and provides a comprehensive evaluation of quantization impacts on LLM safety across multiple models and benchmarks.

Findings

01

Quantization can degrade safety alignment of LLMs.

02

No single quantization method consistently outperforms others in safety.

03

Precision-specific quantization methods excel at their target bit-widths.

Abstract

Large Language Models (LLMs) are powerful tools for modern applications, but their computational demands limit accessibility. Quantization offers efficiency gains, yet its impact on safety and trustworthiness remains poorly understood. To address this, we introduce OpenMiniSafety, a human-curated safety dataset with 1.067 challenging questions to rigorously evaluate model behavior. We publicly release human safety evaluations for four LLMs (both quantized and full-precision), totaling 4.268 annotated question-answer pairs. By assessing 66 quantized variants of these models using four post-training quantization (PTQ) and two quantization-aware training (QAT) methods across four safety benchmarks including human-centric evaluations we uncover critical safety performance trade-offs. Our results show both PTQ and QAT can degrade safety alignment, with QAT techniques like QLORA or STE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

On-Point-RND/OpenSafetyMini-Investigating-the-Impact-of-Quantization-Methods-on-the-Safety-and-Reliability-of-LLM
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsLLaMA