The Computational Complexity of Circuit Discovery for Inner   Interpretability

Federico Adolfi; Martina G. Vilas; Todd Wareham

arXiv:2410.08025·cs.AI·April 2, 2025

The Computational Complexity of Circuit Discovery for Inner Interpretability

Federico Adolfi, Martina G. Vilas, Todd Wareham

PDF

Open Access 1 Video

TL;DR

This paper investigates the computational complexity of discovering neural network circuits for interpretability, revealing many problems are intractable but identifying some tractable cases and heuristic approaches.

Contribution

It formalizes a complexity framework for circuit discovery queries, analyzes their difficulty, and explores the limits and possibilities for scalable interpretability methods.

Findings

01

Many circuit discovery queries are intractable or fixed-parameter intractable.

02

Some queries are inapproximable under various schemes.

03

Transformations and heuristics can address certain hard problems.

Abstract

Many proposed applications of neural networks in machine learning, cognitive/brain science, and society hinge on the feasibility of inner interpretability via circuit discovery. This calls for empirical and theoretical explorations of viable algorithmic options. Despite advances in the design and testing of heuristics, there are concerns about their scalability and faithfulness at a time when we lack understanding of the complexity properties of the problems they are deployed to solve. To address this, we study circuit discovery with classical and parameterized computational complexity theory: (1) we describe a conceptual scaffolding to reason about circuit finding queries in terms of affordances for description, explanation, prediction and control; (2) we formalize a comprehensive set of queries for mechanistic explanation, and propose a formal framework for their analysis; (3) we use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Computational Complexity of Circuit Discovery for Inner Interpretability· slideslive

Taxonomy

TopicsStatistical and Computational Modeling · Neural Networks and Applications · Advanced Database Systems and Queries

MethodsSparse Evolutionary Training