Is Your LLM Outdated? A Deep Look at Temporal Generalization

Chenghao Zhu; Nuo Chen; Yufei Gao; Yunyi Zhang; Prayag Tiwari; Benyou Wang

arXiv:2405.08460·cs.CL·July 2, 2025·1 cites

Is Your LLM Outdated? A Deep Look at Temporal Generalization

Chenghao Zhu, Nuo Chen, Yufei Gao, Yunyi Zhang, Prayag Tiwari, Benyou Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how well large language models adapt to changing temporal information, introduces a new benchmark for evaluation, and finds that model performance declines over time with differences between open-source and closed-source models.

Contribution

It introduces FreshBench, a novel evaluation framework for assessing temporal generalization in LLMs, and provides insights into their temporal biases and adaptability.

Findings

01

Powerful models decline more rapidly over time.

02

Open-source models show better long-term adaptability.

03

Significant temporal biases affect model performance.

Abstract

The rapid advancement of Large Language Models (LLMs) has led to the development of benchmarks that consider temporal dynamics, however, there remains a gap in understanding how well these models can generalize across temporal contexts due to the inherent dynamic nature of language and information. This paper introduces the concept of temporal generalization in LLMs, including bias in past and future generalizations. Then we introduce FreshBench, a new evaluation framework that employs fresh text and event prediction for assessing LLMs' temporal adaptability, ensuring the evaluation process free from data leakage and subjective bias. The experiment shows significant temporal biases and a decline in performance over time. Our findings reveal that powerful models, while initially superior, tend to decline more rapidly in future generalization. Additionally, powerful open-source models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

freedomintelligence/freshbench
noneOfficial

Videos

Is Your LLM Outdated? A Deep Look at Temporal Generalization· underline

Taxonomy

TopicsNatural Language Processing Techniques