All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality
William Timkey, Marten van Schijndel

TL;DR
This paper reveals that a few rogue dimensions dominate similarity measures in contextual language models like BERT and GPT-2, misleading analysis, and shows simple postprocessing can correct this issue to better understand model representations.
Contribution
It identifies the impact of rogue dimensions on similarity measures in contextual models and demonstrates that standardization can mitigate this problem, improving interpretability.
Findings
A small number of rogue dimensions dominate similarity measures.
Standardization corrects for rogue dimensions and reveals true model representations.
Rogue dimensions cause a mismatch between similarity measures and model behavior.
Abstract
Similarity measures are a vital tool for understanding how language models represent and process language. Standard representational similarity measures such as cosine similarity and Euclidean distance have been successfully used in static word embedding models to understand how words cluster in semantic space. Recently, these measures have been applied to embeddings from contextualized models such as BERT and GPT-2. In this work, we call into question the informativity of such measures for contextualized language models. We find that a small number of rogue dimensions, often just 1-3, dominate these measures. Moreover, we find a striking mismatch between the dimensions that dominate similarity measures and those which are important to the behavior of the model. We show that simple postprocessing techniques such as standardization are able to correct for rogue dimensions and reveal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Attention Dropout · WordPiece · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections
