The Computational Complexity of Circuit Discovery for Inner Interpretability
Federico Adolfi, Martina G. Vilas, Todd Wareham

TL;DR
This paper investigates the computational complexity of discovering neural network circuits for interpretability, revealing many problems are intractable but identifying some tractable cases and heuristic approaches.
Contribution
It formalizes a complexity framework for circuit discovery queries, analyzes their difficulty, and explores the limits and possibilities for scalable interpretability methods.
Findings
Many circuit discovery queries are intractable or fixed-parameter intractable.
Some queries are inapproximable under various schemes.
Transformations and heuristics can address certain hard problems.
Abstract
Many proposed applications of neural networks in machine learning, cognitive/brain science, and society hinge on the feasibility of inner interpretability via circuit discovery. This calls for empirical and theoretical explorations of viable algorithmic options. Despite advances in the design and testing of heuristics, there are concerns about their scalability and faithfulness at a time when we lack understanding of the complexity properties of the problems they are deployed to solve. To address this, we study circuit discovery with classical and parameterized computational complexity theory: (1) we describe a conceptual scaffolding to reason about circuit finding queries in terms of affordances for description, explanation, prediction and control; (2) we formalize a comprehensive set of queries for mechanistic explanation, and propose a formal framework for their analysis; (3) we use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStatistical and Computational Modeling · Neural Networks and Applications · Advanced Database Systems and Queries
MethodsSparse Evolutionary Training
