Safe Pruning LoRA: Robust Distance-Guided Pruning for Safety Alignment in Adaptation of LLMs

Shuang Ao; Yi Dong; Jinwei Hu; Sarvapali Ramchurn

arXiv:2506.18931·cs.LG·June 25, 2025

Safe Pruning LoRA: Robust Distance-Guided Pruning for Safety Alignment in Adaptation of LLMs

Shuang Ao, Yi Dong, Jinwei Hu, Sarvapali Ramchurn

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces SPLoRA, a pruning method that enhances safety alignment in fine-tuned LLMs by removing safety-weakening layers, using a new similarity metric to detect misalignment, and demonstrating improved safety and performance.

Contribution

We propose SPLoRA, a novel pruning approach with E-DIEM metric, to improve safety alignment in LoRA-finetuned LLMs while maintaining utility and reducing inference costs.

Findings

01

SPLoRA outperforms existing safety alignment methods.

02

It significantly reduces safety risks in LLMs.

03

It maintains or improves model performance.

Abstract

Fine-tuning Large Language Models (LLMs) with Low-Rank Adaptation (LoRA) enhances adaptability while reducing computational costs. However, fine-tuning can compromise safety alignment, even with benign data, increasing susceptibility to harmful outputs. Existing safety alignment methods struggle to capture complex parameter shifts, leading to suboptimal safety-utility trade-offs. To address this issue, we propose Safe Pruning LoRA (SPLoRA), a novel pruning-based approach that selectively removes LoRA layers that weaken safety alignment, improving safety while preserving performance. At its core, we introduce Empirical-DIEM (E-DIEM), a dimension-insensitive similarity metric that effectively detects safety misalignment in LoRA-adapted models. We conduct extensive experiments on LLMs fine-tuned with mixed of benign and malicious data, and purely benign datasets, evaluating SPLoRA across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aoshuang92/splora
pytorchOfficial

Videos

Safe Pruning LoRA: Robust Distance-Guided Pruning for Safety Alignment in Adaptation of LLMs· underline

Taxonomy

TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis · AI-based Problem Solving and Planning

MethodsPruning