A Closer Look at How Fine-tuning Changes BERT

Yichu Zhou; Vivek Srikumar

arXiv:2106.14282·cs.CL·March 17, 2022·6 cites

A Closer Look at How Fine-tuning Changes BERT

Yichu Zhou, Vivek Srikumar

PDF

Open Access 1 Repo

TL;DR

This paper investigates how fine-tuning BERT alters its embedding space, revealing that it mainly adjusts representations to specific tasks while maintaining their original structure, and challenges the assumption that fine-tuning always improves performance.

Contribution

It provides a detailed analysis of the effects of fine-tuning on BERT's embedding space using probing techniques and experiments across multiple NLP tasks.

Findings

01

Fine-tuning increases distances between differently labeled examples.

02

Fine-tuning does not significantly distort the original embedding structure.

03

There exists an exception where fine-tuning does not improve performance.

Abstract

Given the prevalence of pre-trained contextualized representations in today's NLP, there have been many efforts to understand what information they contain, and why they seem to be universally successful. The most common approach to use these representations involves fine-tuning them for an end task. Yet, how fine-tuning changes the underlying embedding space is less studied. In this work, we study the English BERT family and use two probing techniques to analyze how fine-tuning changes the space. We hypothesize that fine-tuning affects classification performance by increasing the distances between examples associated with different labels. We confirm this hypothesis with carefully designed experiments on five different NLP tasks. Via these experiments, we also discover an exception to the prevailing wisdom that "fine-tuning always improves performance". Finally, by comparing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

utahnlp/BERT-fine-tuning-analysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsLinear Layer · Attention Is All You Need · Weight Decay · WordPiece · Adam · Dropout · Layer Normalization · Multi-Head Attention · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay