AlignGuard-LoRA: Alignment-Preserving Fine-Tuning via Fisher-Guided Decomposition and Riemannian-Geodesic Collision Regularization

Amitava Das; Abhilekh Borah; Vinija Jain; Aman Chadha

arXiv:2508.02079·cs.LG·August 5, 2025

AlignGuard-LoRA: Alignment-Preserving Fine-Tuning via Fisher-Guided Decomposition and Riemannian-Geodesic Collision Regularization

Amitava Das, Abhilekh Borah, Vinija Jain, Aman Chadha

PDF

Open Access

TL;DR

AlignGuard-LoRA is a novel fine-tuning framework that preserves language model alignment by combining Fisher-based regularization and Riemannian collision-aware techniques, significantly reducing safety drift.

Contribution

This work introduces AlignGuard-LoRA, a new method that maintains alignment during fine-tuning through Fisher-guided and Riemannian regularizations, with empirical validation and open-source release.

Findings

01

Mitigates alignment drift by up to 50% on safety benchmarks

02

Each component of AGL contributes to safety preservation

03

Flattens loss escalation while maintaining adaptation dynamics

Abstract

Low-rank adaptation (LoRA) has become a standard tool for efficiently fine-tuning large language models (LLMs). Yet, even minor LoRA updates can induce alignment drift, weakening safety and behavioral constraints through entangled parameter changes. To address this, we propose AlignGuard-LoRA (AGL), a principled framework for preserving alignment during finetuning. AGL introduces several key components: a primary task loss for supervision, Fisher Information Matrix-based regularization to restrict updates in alignment-sensitive subspaces, and task-specific regularization to stabilize the integration of new knowledge. We further introduce collision-aware regularization, blending Riemannian overlap -- which penalizes coordinate-wise interference -- and geodesic separation -- which encourages disjoint update geometry. We curate DriftCaps, a targeted diagnostic benchmark of safe and unsafe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Topic Modeling