EvoWiki: Evaluating LLMs on Evolving Knowledge
Wei Tang, Yixin Cao, Yang Deng, Jiahao Ying, Bo Wang, Yizhe Yang,, Yuyue Zhao, Qi Zhang, Xuanjing Huang, Yugang Jiang, Yong Liao

TL;DR
EvoWiki is a dynamic benchmark dataset designed to evaluate how well large language models adapt to changing knowledge over time, addressing limitations of static benchmarks and highlighting the challenges and potential solutions.
Contribution
We introduce EvoWiki, an auto-updatable dataset that captures knowledge evolution, enabling precise evaluation of LLMs' adaptation to changing information.
Findings
Current LLMs often provide outdated or incorrect responses to evolved knowledge.
RAG and CL methods show a synergistic effect in improving adaptation.
EvoWiki offers a robust benchmark for future research on knowledge evolution in LLMs.
Abstract
Knowledge utilization is a critical aspect of LLMs, and understanding how they adapt to evolving knowledge is essential for their effective deployment. However, existing benchmarks are predominantly static, failing to capture the evolving nature of LLMs and knowledge, leading to inaccuracies and vulnerabilities such as contamination. In this paper, we introduce EvoWiki, an evolving dataset designed to reflect knowledge evolution by categorizing information into stable, evolved, and uncharted states. EvoWiki is fully auto-updatable, enabling precise evaluation of continuously changing knowledge and newly released LLMs. Through experiments with Retrieval-Augmented Generation (RAG) and Contunual Learning (CL), we evaluate how effectively LLMs adapt to evolving knowledge. Our results indicate that current models often struggle with evolved knowledge, frequently providing outdated or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Multi-Head Attention · Residual Connection · Adam · Layer Normalization · Weight Decay · Softmax · WordPiece
