AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints

Aniruddha Roy; Jyoti Patel; Aman Chadha; Vinija Jain; Amitava Das

arXiv:2512.16245·cs.AI·December 19, 2025

AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints

Aniruddha Roy, Jyoti Patel, Aman Chadha, Vinija Jain, Amitava Das

PDF

Open Access

TL;DR

AlignMerge introduces a geometry-aware framework for merging large language models that explicitly preserves alignment by respecting safety geometry, leading to improved safety and task performance without retraining.

Contribution

The paper proposes a novel geometry-constrained merging method, AlignMerge, which explicitly maintains alignment during model merging using Fisher-Rao geometry and a latent-space alignment index.

Findings

01

Improves alignment metrics like AQI and toxicity across multiple model families.

02

Reduces alignment-subspace drift and budget violations compared to existing methods.

03

Maintains or exceeds expert performance on instruction-following and reasoning tasks.

Abstract

Merging large language models (LLMs) is a practical way to compose capabilities from multiple fine-tuned checkpoints without retraining. Yet standard schemes (linear weight soups, task vectors, and Fisher-weighted averaging) can preserve loss while quietly destroying alignment. We argue that merging is not a numerical trick but a geometry-constrained operation around an already-aligned anchor: fusion must be steered to respect safety geometry, not validated post hoc. We introduce AlignMerge, a geometry-aware merging framework that makes alignment an explicit invariant. In a local Fisher chart around an instruction-tuned base, we estimate an alignment subspace with projector P_A and optimize: L_AlignMerge = L_geo + lambda_align * L_align + lambda_bud * L_bud, where L_geo keeps the merge close to its experts in Fisher-Rao geometry, L_align penalizes motion along alignment-sensitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)