Quantized Delta Weight Is Safety Keeper
Yule Liu, Zhen Sun, Xinlei He, Xinyi Huang

TL;DR
This paper demonstrates that quantizing delta weights in fine-tuned language models can improve security against various attacks with minimal utility loss, offering a resource-efficient approach for secure model deployment.
Contribution
It reveals that partial compression of delta weights enhances model security against attacks, providing a new perspective on balancing resource demands and security in language model fine-tuning.
Findings
Partial compression reduces security vulnerabilities significantly.
Under 10% utility loss, security risks decrease by over 60%.
LogitLens visualization explains mechanisms of security improvement.
Abstract
Recent advancements in fine-tuning proprietary language models enable customized applications across various domains but also introduce two major challenges: high resource demands and security risks. Regarding resource demands, recent work proposes novel partial compression, such as BitDelta, to quantize the delta weights between the fine-tuned model and base model. Regarding the security risks, user-defined fine-tuning can introduce security vulnerabilities, such as alignment issues, backdoor attacks, and hallucinations. However, most of the current efforts in security assessment focus on the full-precision or full-compression models, it is not well-discussed how the partial compression methods affect security concerns. To bridge this gap, we evaluate the robustness of delta-weight quantization against these security threats. In this paper, we uncover a "free lunch" phenomenon: partial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Safety Analysis
MethodsFocus · Balanced Selection
