Causal Proxy Models for Concept-Based Model Explanations

Zhengxuan Wu; Karel D'Oosterlinck; Atticus Geiger; Amir Zur; and; Christopher Potts

arXiv:2209.14279·cs.CL·September 29, 2022·6 cites

Causal Proxy Models for Concept-Based Model Explanations

Zhengxuan Wu, Karel D'Oosterlinck, Atticus Geiger, Amir Zur, and, Christopher Potts

PDF

Open Access 1 Repo

TL;DR

This paper introduces Causal Proxy Models (CPMs), which use approximate counterfactuals to provide causal explanations for NLP models, enabling better interpretability without requiring true counterfactual data.

Contribution

The paper proposes CPMs that mimic black-box models and allow for counterfactual interventions, improving explainability in NLP systems.

Findings

01

CPMs can replicate the input/output behavior of black-box models.

02

CPMs enable counterfactual interventions for model explanations.

03

CPMs perform comparably to original models in factual predictions.

Abstract

Explainability methods for NLP systems encounter a version of the fundamental problem of causal inference: for a given ground-truth input text, we never truly observe the counterfactual texts necessary for isolating the causal effects of model representations on outputs. In response, many explainability methods make no use of counterfactual texts, assuming they will be unavailable. In this paper, we show that robust causal explainability methods can be created using approximate counterfactuals, which can be written by humans to approximate a specific counterfactual or simply sampled using metadata-guided heuristics. The core of our proposal is the Causal Proxy Model (CPM). A CPM explains a black-box model $N$ because it is trained to have the same actual input/output behavior as $N$ while creating neural representations that can be intervened upon to simulate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

frankaging/causal-proxy-model
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Bayesian Modeling and Causal Inference