Functionality-Oriented LLM Merging on the Fisher--Rao Manifold
Jiayu Wang, Zuojun Ye, and Wenpeng Yin

TL;DR
This paper introduces a novel method for merging multiple fine-tuned large language models by operating on the Fisher--Rao manifold, improving stability and performance over traditional Euclidean-based approaches.
Contribution
It formulates model merging as a weighted Karcher mean on the Fisher--Rao manifold, addressing limitations of existing Euclidean and geometry-inspired methods, especially for multiple and heterogeneous models.
Findings
Outperforms prior baselines across benchmarks.
Remains stable with increasing model heterogeneity.
Reduces collapse and maintains accuracy during merging.
Abstract
Weight-space merging aims to combine multiple fine-tuned LLMs into a single model without retraining, yet most existing approaches remain fundamentally parameter-space heuristics. This creates three practical limitations. First, linear averaging, task vectors, and related rules operate on Euclidean coordinates, even though the desired goal is to merge functionality, i.e., predictive behaviors across tasks. Second, when the source checkpoints are farther apart or more heterogeneous, Euclidean blends often trigger representation collapse, manifested as activation variance shrinkage and effective-rank degradation, which sharply degrades accuracy. Third, many geometry-inspired methods are most natural for two-model interpolation and do not extend cleanly to merging N>2 experts with a principled objective. We address these issues by formulating model merging as computing a weighted Karcher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI) · Stochastic Gradient Optimization Techniques
