PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction
Jonathn Chang, Arya Datla, Ziv Goldfeld

TL;DR
PLOT introduces a transport-based framework for localizing causal variables in neural networks, improving efficiency and accuracy in causal abstraction analysis through progressive refinement and optimal transport coupling.
Contribution
It proposes PLOT, a novel method that localizes causal variables efficiently using optimal transport, enhancing existing causal abstraction techniques like DAS.
Findings
PLOT is fast and accurate in localizing causal variables.
PLOT-guided DAS achieves DAS-level accuracy with less runtime.
Transport-only PLOT performs well across various model complexities.
Abstract
Causal abstraction offers a principled framework for mechanistic interpretability, aligning a high-level causal model with the low-level computation realized by a neural network through counterfactual intervention analysis. Existing methods such as distributed alignment search (DAS) learn expressive subspace interventions, but the relevant neural site is unknown a priori, so finding a handle requires a computationally burdensome search over candidate sites. We introduce PLOT (Progressive Localization via Optimal Transport), a transport-based framework that localizes causal variables from the output effect geometry of abstract and neural interventions. PLOT fits an optimal transport coupling between abstract variables and candidate neural sites, yielding a global soft correspondence that can be calibrated into intervention handles. In simple settings, a single coupling over individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
