Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment
Qitao Tan, Xiaoying Song, Ningxi Cheng, Ninghao Liu, Xiaoming Zhai, Lingzi Hong, Yanzhi Wang, Zhen Xiang, Geng Yuan

TL;DR
Q-realign is a post-hoc quantization method that enhances safety alignment in large language models during deployment, reducing unsafe behaviors without extensive retraining or high computational costs.
Contribution
It introduces a novel post-training quantization approach that decouples safety alignment from fine-tuning, enabling efficient and safe deployment of LLMs.
Findings
Significantly reduces unsafe behaviors in LLMs.
Preserves task performance after safety alignment.
Achieves safety recovery of a 7B LLM in 40 minutes.
Abstract
Public large language models (LLMs) are typically safety-aligned during pretraining, yet task-specific fine-tuning required for deployment often erodes this alignment and introduces safety risks. Existing defenses either embed safety recovery into fine-tuning or rely on fine-tuning-derived priors for post-hoc correction, leaving safety recovery tightly coupled with training and incurring high computational overhead and a complex workflow. To address these challenges, we propose \texttt{Q-realign}, a post-hoc defense method based on post-training quantization, guided by an analysis of representational structure. By reframing quantization as a dual-objective procedure for compression and safety, \texttt{Q-realign} decouples safety alignment from fine-tuning and naturally piggybacks into modern deployment pipelines. Experiments across multiple models and datasets demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Software-Defined Networks and 5G · Topic Modeling
