On the Interplay Between Fine-tuning and Sentence-level Probing for   Linguistic Knowledge in Pre-trained Transformers

Marius Mosbach; Anna Khokhlova; Michael A. Hedderich; Dietrich Klakow

arXiv:2010.02616·cs.CL·October 7, 2020

On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers

Marius Mosbach, Anna Khokhlova, Michael A. Hedderich, Dietrich Klakow

PDF

Open Access

TL;DR

This paper investigates how fine-tuning affects the linguistic knowledge in pre-trained models like BERT, RoBERTa, and ALBERT using sentence-level probing, revealing that fine-tuning can both enhance and diminish linguistic representations depending on the task and model.

Contribution

It provides a detailed analysis of the impact of fine-tuning on linguistic knowledge in pre-trained transformers, highlighting variability across models and tasks.

Findings

01

Fine-tuning causes substantial changes in probing accuracy for some tasks.

02

Changes in representations are larger in higher layers of models.

03

Fine-tuning sometimes improves probing accuracy beyond strong pooling methods.

Abstract

Fine-tuning pre-trained contextualized embedding models has become an integral part of the NLP pipeline. At the same time, probing has emerged as a way to investigate the linguistic knowledge captured by pre-trained models. Very little is, however, understood about how fine-tuning affects the representations of pre-trained models and thereby the linguistic knowledge they encode. This paper contributes towards closing this gap. We study three different pre-trained models: BERT, RoBERTa, and ALBERT, and investigate through sentence-level probing how fine-tuning affects their representations. We find that for some probing tasks fine-tuning leads to substantial changes in accuracy, possibly suggesting that fine-tuning introduces or even removes linguistic knowledge from a pre-trained model. These changes, however, vary greatly across different models, fine-tuning and probing tasks. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Interpreting and Communication in Healthcare

MethodsLinear Layer · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · LAMB