Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification

Amir Asiaee

arXiv:2602.24266·cs.LG·March 2, 2026

Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification

Amir Asiaee

PDF

Open Access

TL;DR

This paper introduces a method to efficiently identify interpretable causal mechanisms in neural networks by using structured pruning to find high-level causal abstractions that are faithful under interventions.

Contribution

It reframes causal abstraction discovery as a structured pruning problem, deriving a new objective and criteria for extracting sparse, intervention-faithful causal mechanisms from trained networks.

Findings

01

The method recovers variance-based pruning as a special case.

02

It efficiently extracts intervention-faithful abstractions from pretrained networks.

03

Validation via interchange interventions confirms the effectiveness.

Abstract

Neural networks are hypothesized to implement interpretable causal mechanisms, yet verifying this requires finding a causal abstraction -- a simpler, high-level Structural Causal Model (SCM) faithful to the network under interventions. Discovering such abstractions is hard: it typically demands brute-force interchange interventions or retraining. We reframe the problem by viewing structured pruning as a search over approximate abstractions. Treating a trained network as a deterministic SCM, we derive an Interventional Risk objective whose second-order expansion yields closed-form criteria for replacing units with constants or folding them into neighbors. Under uniform curvature, our score reduces to activation variance, recovering variance-based pruning as a special case while clarifying when it fails. The resulting procedure efficiently extracts sparse, intervention-faithful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis