Revisiting Model Stitching to Compare Neural Representations
Yamini Bansal, Preetum Nakkiran, Boaz Barak

TL;DR
This paper revisits model stitching as a tool to analyze neural network representations, demonstrating its ability to reveal similarities between models and uncover structural properties of training minima.
Contribution
It extends model stitching methodology to compare neural representations, showing its advantages over existing measures and revealing new insights into training dynamics.
Findings
Models trained differently can be stitched without performance loss.
More data, width, or training time improves model compatibility.
SGD minima exhibit stitching connectivity similar to mode connectivity.
Abstract
We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks. Given two trained and frozen models and , we consider a "stitched model'' formed by connecting the bottom-layers of to the top-layers of , with a simple trainable layer between them. We argue that model stitching is a powerful and perhaps under-appreciated tool, which reveals aspects of representations that measures such as centered kernel alignment (CKA) cannot. Through extensive experiments, we use model stitching to obtain quantitative verifications for intuitive statements such as "good networks learn similar representations'', by demonstrating that good networks of the same architecture, but trained in very different ways (e.g.: supervised vs. self-supervised learning), can be stitched to each other without drop in performance. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Advanced Neural Network Applications
MethodsStochastic Gradient Descent
