Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models

Kejia Chen; Jiawen Zhang; Jiacong Hu; Yu Wang; Jian Lou; Zunlei Feng; Mingli Song

arXiv:2506.20251·cs.LG·June 26, 2025

Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models

Kejia Chen, Jiawen Zhang, Jiacong Hu, Yu Wang, Jian Lou, Zunlei Feng, Mingli Song

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper evaluates the safety risks introduced by quantization in large language models and proposes Q-resafe, a framework to restore safety capabilities with minimal utility loss, validated through extensive experiments.

Contribution

It provides the first comprehensive safety assessment of quantized LLMs and introduces a novel safety patching framework tailored for quantization-induced vulnerabilities.

Findings

01

Q-resafe effectively restores safety to quantized LLMs.

02

Quantization can significantly impair LLM safety capabilities.

03

The framework maintains model utility while improving safety.

Abstract

Quantized large language models (LLMs) have gained increasing attention and significance for enabling deployment in resource-constrained environments. However, emerging studies on a few calibration dataset-free quantization methods suggest that quantization may compromise the safety capabilities of LLMs, underscoring the urgent need for systematic safety evaluations and effective mitigation strategies. In this paper, we present comprehensive safety evaluations across various mainstream quantization techniques and diverse calibration datasets, utilizing widely accepted safety benchmarks. To address the identified safety vulnerabilities, we propose a quantization-aware safety patching framework, Q-resafe, to efficiently restore the safety capabilities of quantized LLMs while minimizing any adverse impact on utility. Extensive experimental results demonstrate that Q-resafe successfully…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 4

Strengths

+ Comprehensiveness: The study covers all four mainstream categories of LLM quantization techniques covering two post-quantization techniques and two quantization-aware training/finetuning techniques. For quantization techniques needing additional quantization-assisting dataset, the paper uses three datasets with varying safety risk levels: a directly harmful dataset, an indirectly harmful dataset, and a benign dataset. The quantized LLMs are evaluated with two commonly adopted bit-widths. + T

Weaknesses

The paper makes a claim that "Although preliminary literature reports scattered evidence of safety evaluations for quantized LLMs, they are not systematic enough to support a well-rounded evaluation". Its primarily claim to novelty appears to be that it is the first to study the safety impact of quantized LLMs. Here is a list of work studying different aspects of quantizations impact on safety and alignment: - Belkhiter, Yannis, Giulio Zizzo, and Sergio Maffeis. "HarmLevelBench: Evaluating Harm

Reviewer 02Rating 5Confidence 4

Strengths

The paper studies an important but relatively underexplored problem. The evaluation of existing quantization approaches clearly demonstrates the safety issues of quantization and Q-resafe gives significant benefits.

Weaknesses

- The paper claims to be the first systematic assessment of safety risks of quantization. However, I am aware of at least two prior papers in this direction [1, 2]. I suggest the authors conduct a more comprehensive literature review and adjust the claim. - While the paper studies more advanced quantization algorithms, it does not cover algorithms that are already popular, such as LLM.int8(), NF4, and FP4, implemented in the bitsandbytes library. The safety issues of these popular algorithms co

Reviewer 03Rating 5Confidence 3

Strengths

1. This paper raises an intriguing research question: how does quantization impact LLM safety performance? To answer this, the work makes a comprehensive measurement across different quantization settings. 2. The proposed Q-resafe method is effective. It can almost perfectly sustain the safety ability of quantized LLM in the studied safety evaluation.

Weaknesses

1. **Intriguing but less novel topic.** For the safety risks coming with quantization, there have been a great number of works revealing it, from the vision field (e.g., https://dl.acm.org/doi/10.1145/3485832.3485881) to recent LLM research (e.g., https://arxiv.org/abs/2405.18137). From another perspective, safety performance is just one intrinsic ability of LLM. The finding of safety degradation after quantization is not impressive. 2. **Writing is not self-inclusive.** The four considered qua

Code & Models

Repositories

thecommonirin/qresafe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Reliability and Analysis Research · Topic Modeling

MethodsActivation Patching