$C^2M^3$: Cycle-Consistent Multi-Model Merging
Donato Crisostomi, Marco Fumero, Daniele Baieri, Florian Bernard,, Emanuele Rodol\`a

TL;DR
This paper introduces a cycle-consistent, data-free method for merging multiple neural network models in weight space, optimizing neuron permutations globally to improve model integration across diverse architectures.
Contribution
It proposes a novel cycle-consistent permutation optimization technique for merging multiple neural networks without data, ensuring error-free composition across models.
Findings
Cycle consistency improves merging accuracy.
Global permutation optimization outperforms local methods.
Activation renormalization enhances merging results.
Abstract
In this paper, we present a novel data-free method for merging neural networks in weight space. Differently from most existing works, our method optimizes for the permutations of network neurons globally across all layers. This allows us to enforce cycle consistency of the permutations when merging models, allowing circular compositions of permutations to be computed without accumulating error along the path. We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging sets of models in scenarios spanning varying architectures and datasets. We finally show that, when coupled with activation renormalization, our approach yields the best results in the task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel-Driven Software Engineering Techniques
