From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach

Nura Aljaafari; Danilo S. Carvalho; Andre Freitas

arXiv:2605.21303·cs.LG·May 21, 2026

From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach

Nura Aljaafari, Danilo S. Carvalho, Andre Freitas

PDF

TL;DR

This paper introduces a formal framework combining causal signatures and inductive logic programming to enable cumulative, comparable, and scalable mechanistic interpretability of neural networks.

Contribution

It proposes a novel formal infrastructure for circuit interpretation, integrating causal and architectural signatures for better comparison and transferability.

Findings

01

CFS reveals distinct computational strategies across tasks.

02

ILP signatures outperform graph kernel and feature-vector baselines.

03

Supports transfer across model scales and architectures.

Abstract

Mechanistic interpretability produces circuit-level causal analyses of neural network behaviour, but discovered circuits often remain isolated experimental artefacts: there is no shared formal representation for what circuits compute, how they relate, or when two findings provide evidence for the same mechanism. This work provides a formal infrastructure for cumulative mechanistic science by treating circuit interpretation as inductive theory construction. Each circuit is characterised at two levels: a Causal Functional Signature (CFS), which grounds component behaviour in causal attribution evidence and token role profiles, and an architectural signature $τ_{arch}$ , learned by inductive logic programming (ILP) from scale-invariant structural predicates. Together, these constitute a formal coherence layer that makes mechanistic claims explicit, comparable via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.