Mind the Gap: Assessing Temporal Generalization in Neural Language Models
Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang, Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume,, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil, Blunsom

TL;DR
This paper investigates the limitations of static Transformer-XL language models in predicting future, evolving language data over time, highlighting the need for models that can adapt continuously to maintain performance.
Contribution
It demonstrates that increasing model size alone does not address temporal degradation and introduces the importance of continual updating for language models in dynamic environments.
Findings
Performance degrades over time when predicting future data
Larger models do not inherently solve temporal generalization issues
Continual updating mitigates performance decline
Abstract
Our world is open-ended, non-stationary, and constantly evolving; thus what we talk about and how we talk about it change over time. This inherent dynamic nature of language contrasts with the current static language modelling paradigm, which trains and evaluates models on utterances from overlapping time periods. Despite impressive recent progress, we demonstrate that Transformer-XL language models perform worse in the realistic setup of predicting future utterances from beyond their training period, and that model performance becomes increasingly worse with time. We find that, while increasing model size alone -- a key driver behind recent progress -- does not solve this problem, having models that continually update their knowledge with new information can indeed mitigate this performance degradation over time. Hence, given the compilation of ever-larger language modelling datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Adaptive Softmax · Adaptive Input Representations · Variational Dropout · Transformer-XL · Absolute Position Encodings · Position-Wise Feed-Forward Layer
