The future is different: Large pre-trained language models fail in prediction tasks
Kostadin Cvejoski, Rams\'es J. S\'anchez, C\'esar Ojeda

TL;DR
This paper investigates the performance decline of large pre-trained language models over time in dynamic data environments and proposes a lightweight, interpretable method using neural variational topic models to mitigate this issue.
Contribution
It introduces a novel methodology combining neural variational dynamic topic models and attention mechanisms to improve temporal robustness of language models in prediction tasks.
Findings
LPLMs can experience up to 88% performance drop over time.
The proposed method reduces performance drop to about 40%.
Models use only 7% of parameters compared to LPLMs and offer interpretability.
Abstract
Large pre-trained language models (LPLM) have shown spectacular success when fine-tuned on downstream supervised tasks. Yet, it is known that their performance can drastically drop when there is a distribution shift between the data used during training and that used at inference time. In this paper we focus on data distributions that naturally change over time and introduce four new REDDIT datasets, namely the WALLSTREETBETS, ASKSCIENCE, THE DONALD, and POLITICS sub-reddits. First, we empirically demonstrate that LPLM can display average performance drops of about 88% (in the best case!) when predicting the popularity of future posts from sub-reddits whose topic distribution changes with time. We then introduce a simple methodology that leverages neural variational dynamic topic models and attention mechanisms to infer temporal language model representations for regression tasks. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques
