Emergent Properties of Finetuned Language Representation Models

Alexandre Matton; Luke de Oliveira

arXiv:1910.10832·cs.CL·October 25, 2019·1 cites

Emergent Properties of Finetuned Language Representation Models

Alexandre Matton, Luke de Oliveira

PDF

Open Access

TL;DR

This paper investigates the internal properties of finetuned BERT models, revealing redundancy and the importance of specific output dimensions, which could lead to more efficient models without sacrificing accuracy.

Contribution

It provides empirical analysis of information redundancy and identifies key output dimensions in BERT, enhancing understanding of model efficiency and over-parameterization.

Findings

01

[CLS] embedding contains highly redundant information

02

Redundant information can be compressed with minimal accuracy loss

03

Specific output dimensions can achieve competitive results independently

Abstract

Large, self-supervised transformer-based language representation models have recently received significant amounts of attention, and have produced state-of-the-art results across a variety of tasks simply by scaling up pre-training on larger and larger corpora. Such models usually produce high dimensional vectors, on top of which additional task-specific layers and architectural modifications are added to adapt them to specific downstream tasks. Though there exists ample evidence that such models work well, we aim to understand what happens when they work well. We analyze the redundancy and location of information contained in output vectors for one such language representation model -- BERT. We show empirical evidence that the [CLS] embedding in BERT contains highly redundant information, and can be compressed with minimal loss of accuracy, especially for finetuned models, dovetailing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax