Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning   Large Language Models

Chia-Yi Hsu; Yu-Lin Tsai; Chih-Hsun Lin; Pin-Yu Chen; Chia-Mu Yu,; Chun-Ying Huang

arXiv:2405.16833·cs.LG·January 7, 2025·2 cites

Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models

Chia-Yi Hsu, Yu-Lin Tsai, Chih-Hsun Lin, Pin-Yu Chen, Chia-Mu Yu,, Chun-Ying Huang

PDF

Open Access 1 Repo 1 Video

TL;DR

Safe LoRA is a simple, training-free modification to parameter-efficient fine-tuning that reduces safety risks in large language models while maintaining their utility, especially against malicious data influences.

Contribution

It introduces a one-line patch to LoRA, projecting weights into a safety-aligned subspace, effectively enhancing safety without additional training or data.

Findings

01

Retains safety performance on malicious data

02

Mitigates negative effects of malicious data in mixed datasets

03

Maintains downstream task performance

Abstract

While large language models (LLMs) such as Llama-2 or GPT-4 have shown impressive zero-shot performance, fine-tuning is still necessary to enhance their performance for customized datasets, domain-specific tasks, or other private needs. However, fine-tuning all parameters of LLMs requires significant hardware resources, which can be impractical for typical users. Therefore, parameter-efficient fine-tuning such as LoRA have emerged, allowing users to fine-tune LLMs without the need for considerable computing resources, with little performance degradation compared to fine-tuning all parameters. Unfortunately, recent studies indicate that fine-tuning can increase the risk to the safety of LLMs, even when data does not contain malicious content. To address this challenge, we propose Safe LoRA, a simple one-liner patch to the original LoRA implementation by introducing the projection of LoRA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ibm/safelora
pytorchOfficial

Videos

Safe LoRA: The Silver Lining of Reducing Safety Risks when Finetuning Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections