
TL;DR
This paper introduces a bidirectional neural activity transfer method to measure functional similarity and causal relationships between neural networks, with applications in model alignment, interpretability, and understanding task-specific encoding.
Contribution
The work presents a novel method for bidirectional neural activity transfer that can compare multiple models efficiently and reveal causal and representational differences.
Findings
Method can transfer behavior between models similar to stitching
Reveals differences in number encoding in recurrent models
Detects misalignment in fine-tuned models
Abstract
When can we say that two neural systems perform a task in the same way? What nuances do we miss when we fail to causally probe the representations of the systems, and how do we establish bidirectional causal relationships? In this work, we introduce a method that bidirectionally transfers neural activity between artificial neural networks and uses their resulting behavior as a measure of functional similarity. We first show that the method can be used to transfer the behavior from one frozen Neural Network (NN) to another in a manner similar to model stitching, and we show how the method can differ from correlative similarity measures like Representational Similarity Analysis. Next, we empirically and theoretically show how the method can be equivalent to model stitching when desired, or it can take a form that has a more restrictive focus to shared causal information; in both forms, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Semantic Web and Ontologies · Simulation Techniques and Applications
MethodsMixing Adam and SGD · ALIGN
