Visualizing and Measuring the Geometry of BERT
Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda, Vi\'egas, Martin Wattenberg

TL;DR
This paper investigates the internal geometric representations of BERT, revealing semantic and syntactic subspaces, fine-grained word sense geometry, and detailed analyses of attention and embeddings.
Contribution
It provides the first comprehensive geometric analysis of BERT's internal representations, highlighting semantic and syntactic subspaces and word sense geometry.
Findings
Semantic and syntactic features are represented in separate subspaces.
Evidence of fine-grained geometric structure for word senses.
Detailed empirical analysis of attention and embedding geometries.
Abstract
Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks
MethodsLinear Layer · Weight Decay · Residual Connection · Adam · Layer Normalization · Softmax · Attention Is All You Need · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention
