Emergent Properties of Finetuned Language Representation Models
Alexandre Matton, Luke de Oliveira

TL;DR
This paper investigates the internal properties of finetuned BERT models, revealing redundancy and the importance of specific output dimensions, which could lead to more efficient models without sacrificing accuracy.
Contribution
It provides empirical analysis of information redundancy and identifies key output dimensions in BERT, enhancing understanding of model efficiency and over-parameterization.
Findings
[CLS] embedding contains highly redundant information
Redundant information can be compressed with minimal accuracy loss
Specific output dimensions can achieve competitive results independently
Abstract
Large, self-supervised transformer-based language representation models have recently received significant amounts of attention, and have produced state-of-the-art results across a variety of tasks simply by scaling up pre-training on larger and larger corpora. Such models usually produce high dimensional vectors, on top of which additional task-specific layers and architectural modifications are added to adapt them to specific downstream tasks. Though there exists ample evidence that such models work well, we aim to understand what happens when they work well. We analyze the redundancy and location of information contained in output vectors for one such language representation model -- BERT. We show empirical evidence that the [CLS] embedding in BERT contains highly redundant information, and can be compressed with minimal loss of accuracy, especially for finetuned models, dovetailing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
