What if This Modified That? Syntactic Interventions via Counterfactual   Embeddings

Mycal Tucker; Peng Qian; and Roger Levy

arXiv:2105.14002·cs.CL·September 21, 2021

What if This Modified That? Syntactic Interventions via Counterfactual Embeddings

Mycal Tucker, Peng Qian, and Roger Levy

PDF

Open Access 1 Repo

TL;DR

This paper introduces a causal-inspired method for generating counterfactual embeddings in neural language models, revealing that some BERT models encode syntax in a tree-distance-like manner for downstream tasks.

Contribution

The paper presents a novel technique for creating counterfactual embeddings to better understand model reasoning, addressing limitations of probe-based interpretability methods.

Findings

01

BERT models encode syntax in a tree-distance-like structure

02

Counterfactual embeddings can reveal how models use syntactic information

03

The method provides insights into the internal representations of language models

Abstract

Neural language models exhibit impressive performance on a variety of tasks, but their internal reasoning may be difficult to understand. Prior art aims to uncover meaningful properties within model representations via probes, but it is unclear how faithfully such probes portray information that the models actually use. To overcome such limitations, we propose a technique, inspired by causal analysis, for generating counterfactual embeddings within models. In experiments testing our technique, we produce evidence that suggests some BERT-based models use a tree-distance-like representation of syntax in downstream prediction tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mycal-tucker/causal-probe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning