MOMA: Masked Orthogonal Matrix Alignment for Zero-Additional-Parameter Model Merging

Fanshuang Kong; Richong Zhang; Zhijie Nie; Hang Zhou; Ziqiao Wang; Qiang Sun; Chunming Hu

arXiv:2412.13526·cs.LG·February 3, 2026

MOMA: Masked Orthogonal Matrix Alignment for Zero-Additional-Parameter Model Merging

Fanshuang Kong, Richong Zhang, Zhijie Nie, Hang Zhou, Ziqiao Wang, Qiang Sun, Chunming Hu

PDF

Open Access 1 Repo

TL;DR

MOMA introduces a zero-parameter model merging technique that corrects encoder-classifier misalignment through orthogonal transformations, enhancing multi-task learning without extra parameters or inference costs.

Contribution

The paper proposes MOMA, a novel method that aligns model representations via orthogonal transformations, eliminating the need for auxiliary parameters in model merging.

Findings

01

Achieves comparable performance to state-of-the-art methods

02

Operates with zero additional parameters

03

Maintains zero inference cost

Abstract

Model merging offers a scalable alternative to multi-task learning but often yields suboptimal performance on classification tasks. We attribute this degradation to a geometric misalignment between the merged encoder and static task-specific classifier heads. Existing methods typically rely on auxiliary parameters to enforce strict representation alignment. We challenge this approach by revealing that the misalignment is predominantly an orthogonal transformation, rendering such strict alignment unnecessary. Leveraging this insight, we propose MOMA (Masked Orthogonal Matrix Alignment), which rectifies the misalignment by jointly optimizing a global multi-task vector mask and task-specific orthogonal transformations. Crucially, MOMA absorbs corresponding new parameters directly into the existing model weights, achieving performance comparable to state-of-the-art baselines with zero…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fskong/FT-Classifier-for-Model-Merging
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Semantic Web and Ontologies