Git Re-Basin: Merging Models modulo Permutation Symmetries

Samuel K. Ainsworth; Jonathan Hayase; Siddhartha Srinivasa

arXiv:2209.04836·cs.LG·March 3, 2023·32 cites

Git Re-Basin: Merging Models modulo Permutation Symmetries

Samuel K. Ainsworth, Jonathan Hayase, Siddhartha Srinivasa

PDF

Open Access 3 Repos 2 Models 1 Video

TL;DR

This paper introduces algorithms to align and merge neural network models by permuting units, revealing a single basin in loss landscapes and demonstrating mode connectivity across different architectures and training conditions.

Contribution

The paper presents novel algorithms for model permutation alignment, providing evidence for a single basin in neural network loss landscapes and exploring mode connectivity phenomena.

Findings

01

Neural network loss landscapes often contain a single basin after permutation alignment.

02

Demonstrated zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10.

03

Identified relationships between model width, training time, and mode connectivity phenomena.

Abstract

The success of deep learning is due in large part to our ability to solve certain massive non-convex optimization problems with relative ease. Though non-convex optimization is NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes often contain (nearly) a single basin after accounting for all possible permutation symmetries of hidden units a la Entezari et al. 2021. We introduce three algorithms to permute the units of one model to bring them into alignment with a reference model in order to merge the two models in weight space. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Git Re-Basin: Merging Models modulo Permutation Symmetries· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Average Pooling · 1x1 Convolution · Kaiming Initialization · Global Average Pooling · Convolution · Residual Connection · Residual Block · Bottleneck Residual Block