Evaluation of Pipelines for Data Integration into Knowledge Graphs

Marvin Hofer; Erhard Rahm

arXiv:2605.22304·cs.AI·May 22, 2026

Evaluation of Pipelines for Data Integration into Knowledge Graphs

Marvin Hofer, Erhard Rahm

PDF

TL;DR

This paper introduces KGI-Bench, a benchmark for evaluating data integration pipelines into knowledge graphs using multiple quality metrics and provides datasets for the movie domain.

Contribution

It presents a comprehensive benchmark with datasets and metrics for assessing the quality of knowledge graph integration pipelines, enabling better comparison and selection.

Findings

01

Evaluated 12 pipelines across different data formats.

02

Analyzed pipeline behavior using coverage, correctness, and consistency.

03

Provided datasets and ground truth for the movie domain.

Abstract

Integrating new data into knowledge graphs (KG) typically involves different tasks that are executed within workflows or pipelines There are many possible pipelines for a specific integration problem but there is not yet a general approach to evaluate the overall quality and performance of such pipelines to be able to determine the best choices. We therefore propose a new benchmark KGI-Bench to evaluate integration pipelines that ingest different kinds of input data into an existing KG. We evaluate pipelines by analyzing their output, i.e., the updated KG, with the three complementary quality metrics coverage, correctness and consistency. We also provide benchmark datasets (seed KG, overlapping input data of three formats, reference KG as a ground truth) for the movie domain. To demonstrate the applicability and usefulness of the proposed benchmark, we comparatively evaluate 12…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.