S3LoRA: Safe Spectral Sharpness-Guided Pruning in Adaptation of Agent Planner

Shuang Ao; Gopal Rumchurn

arXiv:2508.15068·cs.AI·August 22, 2025

S3LoRA: Safe Spectral Sharpness-Guided Pruning in Adaptation of Agent Planner

Shuang Ao, Gopal Rumchurn

PDF

Open Access

TL;DR

S3LoRA is a lightweight, data-free framework that enhances safety in LLM adaptation by pruning unsafe spectral components of LoRA updates, ensuring safer and more efficient agent planning.

Contribution

It introduces MAS-SVD and SSI metrics for safety-aware pruning of LoRA updates without needing base models or additional data.

Findings

01

Improves safety metrics in agent planning tasks

02

Maintains or enhances task performance

03

Reduces inference cost significantly

Abstract

Adapting Large Language Models (LLMs) using parameter-efficient fine-tuning (PEFT) techniques such as LoRA has enabled powerful capabilities in LLM-based agents. However, these adaptations can unintentionally compromise safety alignment, leading to unsafe or unstable behaviors, particularly in agent planning tasks. Existing safety-aware adaptation methods often require access to both base and instruction-tuned model checkpoints, which are frequently unavailable in practice, limiting their applicability. We propose S3LoRA (Safe Spectral Sharpness-Guided Pruning LoRA), a lightweight, data-free, and model-independent framework that mitigates safety risks in LoRA-adapted models by inspecting only the fine-tuned weight updates. We first introduce Magnitude-Aware Spherically Normalized SVD (MAS-SVD), which robustly analyzes the structural properties of LoRA updates while preserving global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare