Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change
Zhaochen Su, Zecheng Tang, Xinyan Guan, Juntao Li, Lijun Wu, Min Zhang

TL;DR
This paper introduces a lexical-level masking strategy to improve the temporal generalization of pre-trained language models by addressing lexical semantic change, outperforming traditional continual training methods.
Contribution
The paper proposes a novel lexical-level masking approach for post-training language models to enhance their ability to adapt over time, focusing on lexical semantic change.
Findings
The proposed method improves temporal generalization on multiple datasets.
It outperforms existing continual training approaches.
The approach is effective across different models and tasks.
Abstract
Recent research has revealed that neural language models at scale suffer from poor temporal generalization capability, i.e., the language model pre-trained on static data from past years performs worse over time on emerging data. Existing methods mainly perform continual training to mitigate such a misalignment. While effective to some extent but is far from being addressed on both the language modeling and downstream tasks. In this paper, we empirically observe that temporal generalization is closely affiliated with lexical semantic change, which is one of the essential phenomena of natural languages. Based on this observation, we propose a simple yet effective lexical-level masking strategy to post-train a converged language model. Experiments on two pre-trained language models, two different classification tasks, and four benchmark datasets demonstrate the effectiveness of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
