Benchmarking Large Language Models for Knowledge Graph Validation

Farzad Shami; Stefano Marchesin; Gianmaria Silvello

arXiv:2602.10748·cs.DB·February 12, 2026

Benchmarking Large Language Models for Knowledge Graph Validation

Farzad Shami, Stefano Marchesin, Gianmaria Silvello

PDF

Open Access 1 Datasets

TL;DR

This paper introduces FactCheck, a comprehensive benchmark for evaluating large language models in knowledge graph fact validation, highlighting current limitations and the need for further research.

Contribution

The paper presents FactCheck, a new benchmark with datasets and evaluation methods for assessing LLMs in KG validation, including internal knowledge, external evidence, and consensus strategies.

Findings

01

LLMs show promise but lack stability for real-world KG validation.

02

External evidence via RAG yields inconsistent performance improvements.

03

Multi-model consensus strategies do not consistently outperform individual models.

Abstract

Knowledge Graphs (KGs) store structured factual knowledge by linking entities through relationships, crucial for many applications. These applications depend on the KG's factual accuracy, so verifying facts is essential, yet challenging. Expert manual verification is ideal but impractical on a large scale. Automated methods show promise but are not ready for real-world KGs. Large Language Models (LLMs) offer potential with their semantic understanding and knowledge access, yet their suitability and effectiveness for KG fact validation remain largely unexplored. In this paper, we introduce FactCheck, a benchmark designed to evaluate LLMs for KG fact validation across three key dimensions: (1) LLMs internal knowledge; (2) external evidence via Retrieval-Augmented Generation (RAG); and (3) aggregated knowledge employing a multi-model consensus strategy. We evaluated open-source and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

FactCheck-AI/FactCheck
dataset· 247 dl
247 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Data Quality and Management