Orthogonal Model Merging
Sihan Yang, Kexuan Shi, Weiyang Liu

TL;DR
This paper introduces Orthogonal Model Merging (OrthoMerge), a novel approach for merging finetuned large language models on the Riemannian manifold to better preserve geometric properties and improve task performance.
Contribution
OrthoMerge is the first method to perform model merging on the orthogonal group manifold, preserving geometric structure and extending to various finetuning techniques.
Findings
OrthoMerge effectively mitigates catastrophic forgetting.
It maintains model performance across diverse tasks.
The method outperforms linear merging approaches.
Abstract
Merging finetuned Large Language Models (LLMs) has become increasingly important for integrating diverse capabilities into a single unified model. However, prevailing model merging methods rely on linear arithmetic in Euclidean space, which often destroys the intrinsic geometric properties of pretrained weights, such as hyperspherical energy. To address this, we propose Orthogonal Model Merging (OrthoMerge), a method that performs merging operations on the Riemannian manifold formed by the orthogonal group to preserve the geometric structure of the model's weights. By mapping task-specific orthogonal matrices learned by Orthogonal Finetuning (OFT) to the Lie algebra, OrthoMerge enables a principled yet efficient integration that takes into account both the direction and intensity of adaptations. In addition to directly leveraging orthogonal matrices obtained by OFT, we further extend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning
