Revisiting Model Stitching to Compare Neural Representations

Yamini Bansal; Preetum Nakkiran; Boaz Barak

arXiv:2106.07682·cs.LG·June 16, 2021·22 cites

Revisiting Model Stitching to Compare Neural Representations

Yamini Bansal, Preetum Nakkiran, Boaz Barak

PDF

Open Access 1 Video

TL;DR

This paper revisits model stitching as a tool to analyze neural network representations, demonstrating its ability to reveal similarities between models and uncover structural properties of training minima.

Contribution

It extends model stitching methodology to compare neural representations, showing its advantages over existing measures and revealing new insights into training dynamics.

Findings

01

Models trained differently can be stitched without performance loss.

02

More data, width, or training time improves model compatibility.

03

SGD minima exhibit stitching connectivity similar to mode connectivity.

Abstract

We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks. Given two trained and frozen models $A$ and $B$ , we consider a "stitched model'' formed by connecting the bottom-layers of $A$ to the top-layers of $B$ , with a simple trainable layer between them. We argue that model stitching is a powerful and perhaps under-appreciated tool, which reveals aspects of representations that measures such as centered kernel alignment (CKA) cannot. Through extensive experiments, we use model stitching to obtain quantitative verifications for intuitive statements such as "good networks learn similar representations'', by demonstrating that good networks of the same architecture, but trained in very different ways (e.g.: supervised vs. self-supervised learning), can be stitched to each other without drop in performance. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Revisiting Model Stitching to Compare Neural Representations· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Advanced Neural Network Applications

MethodsStochastic Gradient Descent