Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
Yuatyong Chaichana, Thanapat Trachu, Peerat Limkonchotiwat, Konpat Preechakul, Tirasan Khandhawit, Ekapol Chuangsuwanich

TL;DR
Decom-Renorm-Merge (DRM) introduces a novel SVD-based approach to align and merge neural network models in a shared space, improving multitasking capabilities across various architectures.
Contribution
The paper proposes DRM, a new method that uses Singular Value Decomposition to enable effective model merging by aligning weight matrices in a joint space, outperforming existing techniques.
Findings
DRM outperforms state-of-the-art merging methods.
Renormalization is key to creating a robust joint space.
Effective across various model architectures and sizes.
Abstract
In the era of large-scale training, model merging has evolved into a tool for creating multitasking models efficiently. It enables the knowledge of models to be fused, without the need for heavy computation as required in traditional multitask learning. Existing merging methods often assume that entries at identical positions in weight matrices serve the same function, enabling straightforward entry-wise comparison and merging. However, this assumption overlooks the complexity of finetuned neural networks, where neurons may develop distinct feature compositions, making direct entry-wise merging problematic. We present Decom-Renorm-Merge (DRM), a simple yet effective approach that leverages Singular Value Decomposition to decompose and coordinate weight matrices into an aligned joint space, where entry-wise merging becomes possible. We showcase the effectiveness of DRM across various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗yophis/DRM-Llama-3.1-8B-colamodel· 1 dl1 dl
- 🤗yophis/DRM-Llama-3.1-8B-sst2model· 1 dl1 dl
- 🤗yophis/DRM-Llama-3.1-8B-mnlimodel· 2 dl· ♡ 12 dl♡ 1
- 🤗yophis/DRM-Llama-3.1-8B-qnlimodel· 7 dl7 dl
- 🤗yophis/DRM-Llama-3.1-8B-rtemodel· 1 dl1 dl
- 🤗yophis/DRM-T5-Base-pawsmodel· 13 dl13 dl
- 🤗yophis/DRM-T5-Base-qascmodel· 1 dl1 dl
- 🤗yophis/DRM-T5-Base-quartzmodel
- 🤗yophis/DRM-T5-Base-storyclozemodel· 9 dl9 dl
- 🤗yophis/DRM-T5-Base-wikiqamodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · How do I file a dispute with Expedia?*DisputeFastService · Attention Is All You Need · Linear Layer · Byte Pair Encoding · SentencePiece · Multi-Head Attention · Layer Normalization · Inverse Square Root Schedule
