LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion

Guanghao Zhou; Panjia Qiu; Cen Chen; Hongyu Li; Mingyuan Chu; Xin Zhang; Jun Zhou

arXiv:2602.00038·cs.CY·February 3, 2026

LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion

Guanghao Zhou, Panjia Qiu, Cen Chen, Hongyu Li, Mingyuan Chu, Xin Zhang, Jun Zhou

PDF

Open Access 1 Video

TL;DR

LSSF introduces a low-rank safety subspace fusion method that enhances safety alignment in large language models post-fine-tuning, using principal safety components and a novel entropy metric to efficiently restore safety without impairing performance.

Contribution

The paper proposes a novel low-rank safety subspace fusion framework that isolates and restores safety information in LLMs post-fine-tuning, reducing computational costs and improving safety robustness.

Findings

01

Effectively restores safety alignment with minimal performance impact.

02

Uses low-rank projection to extract stable safety components.

03

Introduces safety singular value entropy for dynamic safety rank estimation.

Abstract

The safety mechanisms of large language models (LLMs) exhibit notable fragility, as even fine-tuning on datasets without harmful content may still undermine their safety capabilities. Meanwhile, existing safety alignment methods predominantly rely on the fine-tuning process, which inadvertently leads to the increased complexity and computational resources required. To address these issues, we introduce LSSF, a novel safety re-alignment framework with \underline{L}ow-Rank \underline{S}afety \underline{S}ubspace \underline{F}usion. Our proposed method exploits the low-rank characteristics of safety information in LLMs by constructing a low-rank projection matrix to extract the principal components of safety vectors. Notably, this projection matrix represents the low-rank safety subspace of the LLMs, which we have observed to remain stable during fine-tuning process and is isolated from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Safety Systems Engineering in Autonomy