Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification
Amir Asiaee

TL;DR
This paper introduces a method to efficiently identify interpretable causal mechanisms in neural networks by using structured pruning to find high-level causal abstractions that are faithful under interventions.
Contribution
It reframes causal abstraction discovery as a structured pruning problem, deriving a new objective and criteria for extracting sparse, intervention-faithful causal mechanisms from trained networks.
Findings
The method recovers variance-based pruning as a special case.
It efficiently extracts intervention-faithful abstractions from pretrained networks.
Validation via interchange interventions confirms the effectiveness.
Abstract
Neural networks are hypothesized to implement interpretable causal mechanisms, yet verifying this requires finding a causal abstraction -- a simpler, high-level Structural Causal Model (SCM) faithful to the network under interventions. Discovering such abstractions is hard: it typically demands brute-force interchange interventions or retraining. We reframe the problem by viewing structured pruning as a search over approximate abstractions. Treating a trained network as a deterministic SCM, we derive an Interventional Risk objective whose second-order expansion yields closed-form criteria for replacing units with constants or folding them into neighbors. Under uniform curvature, our score reduces to activation variance, recovering variance-based pruning as a special case while clarifying when it fails. The resulting procedure efficiently extracts sparse, intervention-faithful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis
