From Mechanistic to Compositional Interpretability

Ward Gauderis; Thomas Dooms; Steven T. Holmer; Kola Ayonrinde; Geraint A. Wiggins

arXiv:2605.08934·cs.LG·May 12, 2026

From Mechanistic to Compositional Interpretability

Ward Gauderis, Thomas Dooms, Steven T. Holmer, Kola Ayonrinde, Geraint A. Wiggins

PDF

TL;DR

This paper introduces a formal, compositional framework for neural interpretability using category theory, enabling systematic, verifiable, and concise explanations of model behavior.

Contribution

It develops a novel formal framework for interpretability, connecting mechanistic explanations with compositionality and minimum description length, and introduces methods for model simplification.

Findings

01

Framework unifies mechanistic interpretability with compositionality.

02

Proves a parsimony criterion for concise explanations.

03

Situates existing methods as special cases within the framework.

Abstract

Mechanistic interpretability aims to explain neural model behaviour by reverse-engineering learned computational structure into human-understandable components. Without a formal framework, however, mechanistic explanations cannot be objectively verified, compared, or composed. We introduce compositional interpretability, a category-theoretic framework grounded in the principles of compositionality and minimum description length. Compositional interpretations are pairs of syntactic and semantic mappings that must commute to enforce consistency between a model's decomposition and its observed behaviour. We deconstruct explanation quality into measures of faithfulness and complexity to cast interpretability as a constrained optimisation problem, and introduce compressive refinement to systematically restructure models into simpler parts without altering their function. Finally, we prove a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.