Model Alignment Search

Satchel Grant

arXiv:2501.06164·cs.LG·November 4, 2025

Model Alignment Search

Satchel Grant

PDF

Open Access

TL;DR

This paper introduces a bidirectional neural activity transfer method to measure functional similarity and causal relationships between neural networks, with applications in model alignment, interpretability, and understanding task-specific encoding.

Contribution

The work presents a novel method for bidirectional neural activity transfer that can compare multiple models efficiently and reveal causal and representational differences.

Findings

01

Method can transfer behavior between models similar to stitching

02

Reveals differences in number encoding in recurrent models

03

Detects misalignment in fine-tuned models

Abstract

When can we say that two neural systems perform a task in the same way? What nuances do we miss when we fail to causally probe the representations of the systems, and how do we establish bidirectional causal relationships? In this work, we introduce a method that bidirectionally transfers neural activity between artificial neural networks and uses their resulting behavior as a measure of functional similarity. We first show that the method can be used to transfer the behavior from one frozen Neural Network (NN) to another in a manner similar to model stitching, and we show how the method can differ from correlative similarity measures like Representational Similarity Analysis. Next, we empirically and theoretically show how the method can be equivalent to model stitching when desired, or it can take a form that has a more restrictive focus to shared causal information; in both forms, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Semantic Web and Ontologies · Simulation Techniques and Applications

MethodsMixing Adam and SGD · ALIGN