AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints
Aniruddha Roy, Jyoti Patel, Aman Chadha, Vinija Jain, Amitava Das

TL;DR
AlignMerge introduces a geometry-aware framework for merging large language models that explicitly preserves alignment by respecting safety geometry, leading to improved safety and task performance without retraining.
Contribution
The paper proposes a novel geometry-constrained merging method, AlignMerge, which explicitly maintains alignment during model merging using Fisher-Rao geometry and a latent-space alignment index.
Findings
Improves alignment metrics like AQI and toxicity across multiple model families.
Reduces alignment-subspace drift and budget violations compared to existing methods.
Maintains or exceeds expert performance on instruction-following and reasoning tasks.
Abstract
Merging large language models (LLMs) is a practical way to compose capabilities from multiple fine-tuned checkpoints without retraining. Yet standard schemes (linear weight soups, task vectors, and Fisher-weighted averaging) can preserve loss while quietly destroying alignment. We argue that merging is not a numerical trick but a geometry-constrained operation around an already-aligned anchor: fusion must be steered to respect safety geometry, not validated post hoc. We introduce AlignMerge, a geometry-aware merging framework that makes alignment an explicit invariant. In a local Fisher chart around an instruction-tuned base, we estimate an alignment subspace with projector P_A and optimize: L_AlignMerge = L_geo + lambda_align * L_align + lambda_bud * L_bud, where L_geo keeps the merge close to its experts in Fisher-Rao geometry, L_align penalizes motion along alignment-sensitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
