MINAR: Mechanistic Interpretability for Neural Algorithmic Reasoning
Jesse He, Helen Jenne, Max Vargas, Davis Brown, Gal Mishne, Yusu Wang, Henry Kvinge

TL;DR
MINAR introduces a toolbox for mechanistic interpretability in neural algorithmic reasoning, enabling the discovery of neuron-level circuits in GNNs trained on algorithms, revealing insights into circuit formation, pruning, and reuse during training.
Contribution
It adapts attribution patching methods to GNNs for circuit discovery, advancing understanding of neural algorithmic reasoning and circuit reuse in multi-task training.
Findings
MINAR successfully recovers faithful neuron-level circuits from GNNs.
Circuit formation and pruning are observed during training.
GNNs reuse circuit components for related tasks in multi-task settings.
Abstract
The recent field of neural algorithmic reasoning (NAR) studies the ability of graph neural networks (GNNs) to emulate classical algorithms like Bellman-Ford, a phenomenon known as algorithmic alignment. At the same time, recent advances in large language models (LLMs) have spawned the study of mechanistic interpretability, which aims to identify granular model components like circuits that perform specific computations. In this work, we introduce Mechanistic Interpretability for Neural Algorithmic Reasoning (MINAR), an efficient circuit discovery toolbox that adapts attribution patching methods from mechanistic interpretability to the GNN setting. We show through two case studies that MINAR recovers faithful neuron-level circuits from GNNs trained on algorithmic tasks. Our study sheds new light on the process of circuit formation and pruning during training, as well as giving new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks · Adversarial Robustness in Machine Learning
