Orthogonal Model Merging

Sihan Yang; Kexuan Shi; Weiyang Liu

arXiv:2602.05943·cs.LG·February 6, 2026

Orthogonal Model Merging

Sihan Yang, Kexuan Shi, Weiyang Liu

PDF

Open Access

TL;DR

This paper introduces Orthogonal Model Merging (OrthoMerge), a novel approach for merging finetuned large language models on the Riemannian manifold to better preserve geometric properties and improve task performance.

Contribution

OrthoMerge is the first method to perform model merging on the orthogonal group manifold, preserving geometric structure and extending to various finetuning techniques.

Findings

01

OrthoMerge effectively mitigates catastrophic forgetting.

02

It maintains model performance across diverse tasks.

03

The method outperforms linear merging approaches.

Abstract

Merging finetuned Large Language Models (LLMs) has become increasingly important for integrating diverse capabilities into a single unified model. However, prevailing model merging methods rely on linear arithmetic in Euclidean space, which often destroys the intrinsic geometric properties of pretrained weights, such as hyperspherical energy. To address this, we propose Orthogonal Model Merging (OrthoMerge), a method that performs merging operations on the Riemannian manifold formed by the orthogonal group to preserve the geometric structure of the model's weights. By mapping task-specific orthogonal matrices learned by Orthogonal Finetuning (OFT) to the Lie algebra, OrthoMerge enables a principled yet efficient integration that takes into account both the direction and intensity of adaptations. In addition to directly leveraging orthogonal matrices obtained by OFT, we further extend…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning