Picking BERT's Brain: Probing for Linguistic Dependencies in   Contextualized Embeddings Using Representational Similarity Analysis

Michael A. Lepori; R. Thomas McCoy

arXiv:2011.12073·cs.CL·November 25, 2020

Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis

Michael A. Lepori, R. Thomas McCoy

PDF

1 Repo

TL;DR

This paper uses Representational Similarity Analysis to explore how BERT's contextualized embeddings encode various linguistic dependencies, revealing that BERT captures these dependencies more than less salient controls.

Contribution

It introduces a novel RSA-based method to probe linguistic dependencies in BERT's embeddings, providing insights into what aspects of context are encoded.

Findings

01

BERT encodes subject-verb dependencies effectively.

02

Pronoun embeddings reflect antecedent relationships.

03

Full-sentence embeddings encode key head words.

Abstract

As the name implies, contextualized representations of language are typically motivated by their ability to encode context. Which aspects of context are captured by such representations? We introduce an approach to address this question using Representational Similarity Analysis (RSA). As case studies, we investigate the degree to which a verb embedding encodes the verb's subject, a pronoun embedding encodes the pronoun's antecedent, and a full-sentence representation encodes the sentence's head word (as determined by a dependency parse). In all cases, we show that BERT's contextualized embeddings reflect the linguistic dependency being studied, and that BERT encodes these dependencies to a greater degree than it encodes less linguistically-salient controls. These results demonstrate the ability of our approach to adjudicate between hypotheses about which aspects of context are encoded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlepori1/Picking_BERTs_Brain
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Layer Normalization · Linear Warmup With Linear Decay · Residual Connection · Dropout · Softmax · Adam · Attention Is All You Need · Weight Decay · WordPiece