Causal Abstractions of Neural Networks

Atticus Geiger; Hanson Lu; Thomas Icard; Christopher Potts

arXiv:2106.02997·cs.AI·October 28, 2021

Causal Abstractions of Neural Networks

Atticus Geiger, Hanson Lu, Thomas Icard, Christopher Potts

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a formal causal abstraction framework for analyzing neural network internal representations, verified through interventions, applied to natural language inference models to reveal their encoding of compositional causal structures.

Contribution

It presents a novel causal abstraction method for neural network analysis, linking internal representations to interpretable causal models and verifying their causal properties through interventions.

Findings

01

BERT models encode parts of the natural logic causal structure.

02

Simpler models do not exhibit the same causal structure.

03

The method provides rich characterizations of neural representations.

Abstract

Structural analysis methods (e.g., probing and feature attribution) are increasingly important tools for neural network analysis. We propose a new structural analysis method grounded in a formal theory of causal abstraction that provides rich characterizations of model-internal representations and their roles in input/output behavior. In this method, neural representations are aligned with variables in interpretable causal models, and then interchange interventions are used to experimentally verify that the neural representations have the causal properties of their aligned variables. We apply this method in a case study to analyze neural models trained on Multiply Quantified Natural Language Inference (MQNLI) corpus, a highly complex NLI dataset that was constructed with a tree-structured natural logic causal model. We discover that a BERT-based model with state-of-the-art performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

avariengien/causal-checker
pytorch

Videos

Causal Abstractions of Neural Networks· slideslive

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks

MethodsLinear Layer · Layer Normalization · Residual Connection · Softmax · Dense Connections · Linear Warmup With Linear Decay · WordPiece · Weight Decay · Attention Is All You Need · Attention Dropout