From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach
Nura Aljaafari, Danilo S. Carvalho, Andre Freitas

TL;DR
This paper introduces a formal framework combining causal signatures and inductive logic programming to enable cumulative, comparable, and scalable mechanistic interpretability of neural networks.
Contribution
It proposes a novel formal infrastructure for circuit interpretation, integrating causal and architectural signatures for better comparison and transferability.
Findings
CFS reveals distinct computational strategies across tasks.
ILP signatures outperform graph kernel and feature-vector baselines.
Supports transfer across model scales and architectures.
Abstract
Mechanistic interpretability produces circuit-level causal analyses of neural network behaviour, but discovered circuits often remain isolated experimental artefacts: there is no shared formal representation for what circuits compute, how they relate, or when two findings provide evidence for the same mechanism. This work provides a formal infrastructure for cumulative mechanistic science by treating circuit interpretation as inductive theory construction. Each circuit is characterised at two levels: a Causal Functional Signature (CFS), which grounds component behaviour in causal attribution evidence and token role profiles, and an architectural signature , learned by inductive logic programming (ILP) from scale-invariant structural predicates. Together, these constitute a formal coherence layer that makes mechanistic claims explicit, comparable via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
