EMERGE: A Benchmark for Updating Knowledge Graphs with Emerging Textual Knowledge

Klim Zaporojets; Daniel Daza; Edoardo Barba; Ira Assent; Roberto Navigli; Paul Groth

arXiv:2507.03617·cs.CL·April 8, 2026

EMERGE: A Benchmark for Updating Knowledge Graphs with Emerging Textual Knowledge

Klim Zaporojets, Daniel Daza, Edoardo Barba, Ira Assent, Roberto Navigli, Paul Groth

PDF

1 Datasets

TL;DR

This paper introduces EMERGE, a benchmark dataset for updating knowledge graphs with emerging textual knowledge, highlighting challenges and providing a resource for future research.

Contribution

It constructs a large-scale dataset of Wikidata snapshots and Wikipedia passages with corresponding KG edits to facilitate research on KG updating methods.

Findings

01

Identified key challenges in integrating textual knowledge with existing KGs.

02

Created a dataset with 233K passages and 1.45 million KG edits over 7 years.

03

Published the dataset and models for community use.

Abstract

Knowledge Graphs (KGs) are structured knowledge repositories containing entities and relations between them. In this paper, we study the problem of automatically updating KGs over time in response to evolving knowledge in unstructured textual sources. Addressing this problem requires identifying a wide range of update operations based on the state of an existing KG at a given time and the information extracted from text. This contrasts with traditional information extraction pipelines, which extract knowledge from text independently of the current state of a KG. To address this challenge, we propose a method for construction of a dataset consisting of Wikidata KG snapshots over time and Wikipedia passages paired with the corresponding edit operations that they induce in a particular KG snapshot. The resulting dataset comprises 233K Wikipedia passages aligned with a total of 1.45 million…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

klimzaporojets/emerge-benchmark
dataset· 743 dl
743 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.