All Bark and No Bite: Rogue Dimensions in Transformer Language Models   Obscure Representational Quality

William Timkey; Marten van Schijndel

arXiv:2109.04404·cs.CL·September 10, 2021

All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality

William Timkey, Marten van Schijndel

PDF

Open Access 1 Repo

TL;DR

This paper reveals that a few rogue dimensions dominate similarity measures in contextual language models like BERT and GPT-2, misleading analysis, and shows simple postprocessing can correct this issue to better understand model representations.

Contribution

It identifies the impact of rogue dimensions on similarity measures in contextual models and demonstrates that standardization can mitigate this problem, improving interpretability.

Findings

01

A small number of rogue dimensions dominate similarity measures.

02

Standardization corrects for rogue dimensions and reveals true model representations.

03

Rogue dimensions cause a mismatch between similarity measures and model behavior.

Abstract

Similarity measures are a vital tool for understanding how language models represent and process language. Standard representational similarity measures such as cosine similarity and Euclidean distance have been successfully used in static word embedding models to understand how words cluster in semantic space. Recently, these measures have been applied to embeddings from contextualized models such as BERT and GPT-2. In this work, we call into question the informativity of such measures for contextualized language models. We find that a small number of rogue dimensions, often just 1-3, dominate these measures. Moreover, we find a striking mismatch between the dimensions that dominate similarity measures and those which are important to the behavior of the model. We show that simple postprocessing techniques such as standardization are able to correct for rogue dimensions and reveal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wtimkey/rogue-dimensions
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Attention Dropout · WordPiece · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections