EvoWiki: Evaluating LLMs on Evolving Knowledge

Wei Tang; Yixin Cao; Yang Deng; Jiahao Ying; Bo Wang; Yizhe Yang,; Yuyue Zhao; Qi Zhang; Xuanjing Huang; Yugang Jiang; Yong Liao

arXiv:2412.13582·cs.CL·December 19, 2024

EvoWiki: Evaluating LLMs on Evolving Knowledge

Wei Tang, Yixin Cao, Yang Deng, Jiahao Ying, Bo Wang, Yizhe Yang,, Yuyue Zhao, Qi Zhang, Xuanjing Huang, Yugang Jiang, Yong Liao

PDF

Open Access

TL;DR

EvoWiki is a dynamic benchmark dataset designed to evaluate how well large language models adapt to changing knowledge over time, addressing limitations of static benchmarks and highlighting the challenges and potential solutions.

Contribution

We introduce EvoWiki, an auto-updatable dataset that captures knowledge evolution, enabling precise evaluation of LLMs' adaptation to changing information.

Findings

01

Current LLMs often provide outdated or incorrect responses to evolved knowledge.

02

RAG and CL methods show a synergistic effect in improving adaptation.

03

EvoWiki offers a robust benchmark for future research on knowledge evolution in LLMs.

Abstract

Knowledge utilization is a critical aspect of LLMs, and understanding how they adapt to evolving knowledge is essential for their effective deployment. However, existing benchmarks are predominantly static, failing to capture the evolving nature of LLMs and knowledge, leading to inaccuracies and vulnerabilities such as contamination. In this paper, we introduce EvoWiki, an evolving dataset designed to reflect knowledge evolution by categorizing information into stable, evolved, and uncharted states. EvoWiki is fully auto-updatable, enabling precise evaluation of continuously changing knowledge and newly released LLMs. Through experiments with Retrieval-Augmented Generation (RAG) and Contunual Learning (CL), we evaluate how effectively LLMs adapt to evolving knowledge. Our results indicate that current models often struggle with evolved knowledge, frequently providing outdated or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Multi-Head Attention · Residual Connection · Adam · Layer Normalization · Weight Decay · Softmax · WordPiece