Model Merging: Foundations and Algorithms

Donato Crisostomi

arXiv:2605.01580·cs.LG·May 5, 2026

Model Merging: Foundations and Algorithms

Donato Crisostomi

PDF

TL;DR

This thesis explores model merging as an alternative to model replacement in deep learning, proposing algorithms and theoretical insights for combining neural networks directly in weight space without access to training data.

Contribution

It introduces novel algorithms and theoretical frameworks for model merging in single-task and multi-task settings, including C$^2$M$^3$, TSV-Merge, MASS, and MERGE$^3$, advancing the understanding and efficiency of model composition.

Findings

01

C$^2$M$^3$ aligns models into a shared parameter space for meaningful weight averaging.

02

Task vectors exhibit low-rank gradient structures enabling effective compression.

03

MERGE$^3$ reduces evaluation costs by up to 50 times while maintaining solution quality.

Abstract

Modern deep learning usually treats models as separate artifacts: trained independently, specialized for particular purposes, and replaced when improved versions appear. This thesis studies model merging as an alternative paradigm: combining independently trained neural networks directly in weight space, with little or no optimization and without requiring access to the original training data. The thesis considers two main regimes. In the single-task setting, where models share an objective but differ in initialization, we introduce C $^{2}$ M $^{3}$ , a cycle-consistent merging algorithm based on Frank-Wolfe optimization. C $^{2}$ M $^{3}$ aligns multiple networks into a shared, reference-free parameter space, making weight averaging meaningful without privileging any individual model. In the multi-task setting, where models are fine-tuned for different downstream tasks from a common pretrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.