How does the pre-training objective affect what large language models   learn about linguistic properties?

Ahmed Alajrami; Nikolaos Aletras

arXiv:2203.10415·cs.CL·March 22, 2022

How does the pre-training objective affect what large language models learn about linguistic properties?

Ahmed Alajrami, Nikolaos Aletras

PDF

Open Access 1 Repo

TL;DR

This study investigates how different pre-training objectives influence what large language models like BERT learn about linguistic properties, revealing minimal differences between linguistically motivated and non-motivated objectives.

Contribution

The paper compares linguistically motivated and non-motivated pre-training objectives, showing they produce similar linguistic representations in BERT, challenging existing assumptions.

Findings

01

Small differences in linguistic probing performance between objectives

02

Linguistically motivated objectives do not significantly outperform others

03

Questions the importance of linguistically informed pre-training

Abstract

Several pre-training objectives, such as masked language modeling (MLM), have been proposed to pre-train language models (e.g. BERT) with the aim of learning better language representations. However, to the best of our knowledge, no previous work so far has investigated how different pre-training objectives affect what BERT learns about linguistics properties. We hypothesize that linguistically motivated objectives such as MLM should help BERT to acquire better linguistic knowledge compared to other non-linguistically motivated objectives that are not intuitive or hard for humans to guess the association between the input and the label to be predicted. To this end, we pre-train BERT with two linguistically motivated objectives and three non-linguistically motivated ones. We then probe for linguistic characteristics encoded in the representation of the resulting models. We find strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aajrami/acl2022-pre-training-objectives-probing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Linear Decay · Dense Connections · Weight Decay · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia?