Intrinsic Probing through Dimension Selection

Lucas Torroba Hennigen; Adina Williams; Ryan Cotterell

arXiv:2010.02812·cs.CL·October 7, 2020

Intrinsic Probing through Dimension Selection

Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell

PDF

1 Repo

TL;DR

This paper introduces a new intrinsic probing framework using a decomposable Gaussian model to analyze how linguistic information is structured within word embeddings, revealing that most attributes are localized in few neurons.

Contribution

It proposes a novel intrinsic probing method based on a decomposable Gaussian model to assess the distribution of linguistic information in embeddings.

Findings

01

Most morphosyntactic attributes are encoded in only a few neurons.

02

fastText concentrates more linguistic structure than BERT.

03

The framework enables distinguishing between dispersed and focal linguistic information.

Abstract

Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks. Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it. In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted. To enable intrinsic probing, we propose a novel framework based on a decomposable multivariate Gaussian probe that allows us to determine whether the linguistic information in word embeddings is dispersed or focal. We then probe fastText and BERT for various morphosyntactic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rycolab/intrinsic-probing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · fastText · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay