CausaLM: Causal Model Explanation Through Counterfactual Language Models

Amir Feder; Nadav Oved; Uri Shalit; Roi Reichart

arXiv:2005.13407·cs.CL·November 15, 2022

CausaLM: Causal Model Explanation Through Counterfactual Language Models

Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart

PDF

1 Repo

TL;DR

CausaLM introduces a method to generate counterfactual language representations for explaining and analyzing the causal effects of concepts in neural language models, aiding interpretability and bias mitigation.

Contribution

The paper presents a novel framework using adversarial fine-tuning of language models to produce counterfactual representations for causal analysis in NLP.

Findings

01

Effective estimation of causal effects using counterfactual representations

02

Language models can be fine-tuned to be unaffected by specific concepts

03

Potential for bias mitigation in language models

Abstract

Understanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all machine learning based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amirfeder/causalm
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Weight Decay · Softmax · Adam · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay · Dense Connections