Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media
Paul R\"ottger, Janet B. Pierrehumbert

TL;DR
This study explores how adapting BERT to temporal changes in social media language over three years affects downstream document classification, revealing that temporal adaptation benefits upstream tasks but has limited impact on classification performance.
Contribution
Introduces a temporal social media corpus and evaluates the effects of temporal adaptation on BERT's downstream classification, highlighting when it is beneficial.
Findings
Temporal adaptation improves upstream masked language modeling.
Time-specific models perform better on past data.
Temporal adaptation does not significantly enhance downstream classification.
Abstract
Language use differs between domains and even within a domain, language use changes over time. For pre-trained language models like BERT, domain adaptation through continued pre-training has been shown to improve performance on in-domain downstream tasks. In this article, we investigate whether temporal adaptation can bring additional benefits. For this purpose, we introduce a corpus of social media comments sampled over three years. It contains unlabelled data for adaptation and evaluation on an upstream masked language modelling task as well as labelled data for fine-tuning and evaluation on a downstream document classification task. We find that temporality matters for both tasks: temporal adaptation improves upstream and temporal fine-tuning downstream task performance. Time-specific models generally perform better on past than on future test sets, which matches evidence on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Adam · Dense Connections · Softmax · Linear Warmup With Linear Decay · Attention Dropout
