MOMA: Masked Orthogonal Matrix Alignment for Zero-Additional-Parameter Model Merging
Fanshuang Kong, Richong Zhang, Zhijie Nie, Hang Zhou, Ziqiao Wang, Qiang Sun, Chunming Hu

TL;DR
MOMA introduces a zero-parameter model merging technique that corrects encoder-classifier misalignment through orthogonal transformations, enhancing multi-task learning without extra parameters or inference costs.
Contribution
The paper proposes MOMA, a novel method that aligns model representations via orthogonal transformations, eliminating the need for auxiliary parameters in model merging.
Findings
Achieves comparable performance to state-of-the-art methods
Operates with zero additional parameters
Maintains zero inference cost
Abstract
Model merging offers a scalable alternative to multi-task learning but often yields suboptimal performance on classification tasks. We attribute this degradation to a geometric misalignment between the merged encoder and static task-specific classifier heads. Existing methods typically rely on auxiliary parameters to enforce strict representation alignment. We challenge this approach by revealing that the misalignment is predominantly an orthogonal transformation, rendering such strict alignment unnecessary. Leveraging this insight, we propose MOMA (Masked Orthogonal Matrix Alignment), which rectifies the misalignment by jointly optimizing a global multi-task vector mask and task-specific orthogonal transformations. Crucially, MOMA absorbs corresponding new parameters directly into the existing model weights, achieving performance comparable to state-of-the-art baselines with zero…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Semantic Web and Ontologies
