Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking

Yuatyong Chaichana; Thanapat Trachu; Peerat Limkonchotiwat; Konpat Preechakul; Tirasan Khandhawit; Ekapol Chuangsuwanich

arXiv:2505.23117·cs.LG·October 30, 2025

Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking

Yuatyong Chaichana, Thanapat Trachu, Peerat Limkonchotiwat, Konpat Preechakul, Tirasan Khandhawit, Ekapol Chuangsuwanich

PDF

Open Access 10 Models

TL;DR

Decom-Renorm-Merge (DRM) introduces a novel SVD-based approach to align and merge neural network models in a shared space, improving multitasking capabilities across various architectures.

Contribution

The paper proposes DRM, a new method that uses Singular Value Decomposition to enable effective model merging by aligning weight matrices in a joint space, outperforming existing techniques.

Findings

01

DRM outperforms state-of-the-art merging methods.

02

Renormalization is key to creating a robust joint space.

03

Effective across various model architectures and sizes.

Abstract

In the era of large-scale training, model merging has evolved into a tool for creating multitasking models efficiently. It enables the knowledge of models to be fused, without the need for heavy computation as required in traditional multitask learning. Existing merging methods often assume that entries at identical positions in weight matrices serve the same function, enabling straightforward entry-wise comparison and merging. However, this assumption overlooks the complexity of finetuned neural networks, where neurons may develop distinct feature compositions, making direct entry-wise merging problematic. We present Decom-Renorm-Merge (DRM), a simple yet effective approach that leverages Singular Value Decomposition to decompose and coordinate weight matrices into an aligned joint space, where entry-wise merging becomes possible. We showcase the effectiveness of DRM across various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis

MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · How do I file a dispute with Expedia?*DisputeFastService · Attention Is All You Need · Linear Layer · Byte Pair Encoding · SentencePiece · Multi-Head Attention · Layer Normalization · Inverse Square Root Schedule