Git Re-Basin: Merging Models modulo Permutation Symmetries
Samuel K. Ainsworth, Jonathan Hayase, Siddhartha Srinivasa

TL;DR
This paper introduces algorithms to align and merge neural network models by permuting units, revealing a single basin in loss landscapes and demonstrating mode connectivity across different architectures and training conditions.
Contribution
The paper presents novel algorithms for model permutation alignment, providing evidence for a single basin in neural network loss landscapes and exploring mode connectivity phenomena.
Findings
Neural network loss landscapes often contain a single basin after permutation alignment.
Demonstrated zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10.
Identified relationships between model width, training time, and mode connectivity phenomena.
Abstract
The success of deep learning is due in large part to our ability to solve certain massive non-convex optimization problems with relative ease. Though non-convex optimization is NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes often contain (nearly) a single basin after accounting for all possible permutation symmetries of hidden units a la Entezari et al. 2021. We introduce three algorithms to permute the units of one model to bring them into alignment with a reference model in order to merge the two models in weight space. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Average Pooling · 1x1 Convolution · Kaiming Initialization · Global Average Pooling · Convolution · Residual Connection · Residual Block · Bottleneck Residual Block
