How does the pre-training objective affect what large language models learn about linguistic properties?
Ahmed Alajrami, Nikolaos Aletras

TL;DR
This study investigates how different pre-training objectives influence what large language models like BERT learn about linguistic properties, revealing minimal differences between linguistically motivated and non-motivated objectives.
Contribution
The paper compares linguistically motivated and non-motivated pre-training objectives, showing they produce similar linguistic representations in BERT, challenging existing assumptions.
Findings
Small differences in linguistic probing performance between objectives
Linguistically motivated objectives do not significantly outperform others
Questions the importance of linguistically informed pre-training
Abstract
Several pre-training objectives, such as masked language modeling (MLM), have been proposed to pre-train language models (e.g. BERT) with the aim of learning better language representations. However, to the best of our knowledge, no previous work so far has investigated how different pre-training objectives affect what BERT learns about linguistics properties. We hypothesize that linguistically motivated objectives such as MLM should help BERT to acquire better linguistic knowledge compared to other non-linguistically motivated objectives that are not intuitive or hard for humans to guess the association between the input and the label to be predicted. To this end, we pre-train BERT with two linguistically motivated objectives and three non-linguistically motivated ones. We then probe for linguistic characteristics encoded in the representation of the resulting models. We find strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Linear Decay · Dense Connections · Weight Decay · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia?
