The Low-Dimensional Linear Geometry of Contextualized Word   Representations

Evan Hernandez; Jacob Andreas

arXiv:2105.07109·cs.CL·September 15, 2021

The Low-Dimensional Linear Geometry of Contextualized Word Representations

Evan Hernandez, Jacob Andreas

PDF

Open Access

TL;DR

This paper investigates how linguistic features are geometrically encoded in low-dimensional subspaces within contextualized word representations like ELMO and BERT, revealing hierarchical and distributed encoding structures.

Contribution

It provides a systematic analysis of the linear geometry of linguistic features in BERT and ELMO, uncovering hierarchical relations and causal links to model behavior.

Findings

01

Linguistic features are encoded in low-dimensional subspaces.

02

Hierarchical relations exist between general and specific feature subspaces.

03

Linear subspaces can causally influence model outputs.

Abstract

Black-box probing models can reliably extract linguistic features like tense, number, and syntactic role from pretrained word representations. However, the manner in which these features are encoded in representations remains poorly understood. We present a systematic study of the linear geometry of contextualized word representations in ELMO and BERT. We show that a variety of linguistic features (including structured dependency relationships) are encoded in low-dimensional subspaces. We then refine this geometric picture, showing that there are hierarchical relations between the subspaces encoding general linguistic categories and more specific ones, and that low-dimensional feature encodings are distributed rather than aligned to individual neurons. Finally, we demonstrate that these linear subspaces are causally related to model behavior, and can be used to perform fine-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Layer Normalization · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout