Causal Mediation Analysis for Interpreting Neural NLP: The Case of   Gender Bias

Jesse Vig; Sebastian Gehrmann; Yonatan Belinkov; Sharon Qian; Daniel; Nevo; Simas Sakenis; Jason Huang; Yaron Singer; Stuart Shieber

arXiv:2004.12265·cs.CL·November 24, 2020·66 cites

Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel, Nevo, Simas Sakenis, Jason Huang, Yaron Singer, Stuart Shieber

PDF

Open Access 1 Repo

TL;DR

This paper introduces a causal mediation analysis framework to interpret neural NLP models, revealing how specific components contribute to gender bias in Transformer models.

Contribution

It applies causal mediation analysis to neural NLP, uncovering the causal roles of neurons and attention heads in gender bias propagation.

Findings

01

Gender bias is concentrated in few network components.

02

Bias effects are synergistic and context-dependent.

03

Bias can be decomposed into direct and mediated effects.

Abstract

Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. It enables us to analyze the mechanisms by which information flows from input to output through various model components, known as mediators. We apply this methodology to analyze gender bias in pre-trained Transformer language models. We study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model's sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are (i) sparse, concentrated in a small part of the network; (ii) synergistic, amplified or repressed by different components; and (iii)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sebastianGehrmann/CausalMediationAnalysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax