KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge

Alex Robertson; Huizhi Liang; Mahbub Gani; Rohit Kumar; Srijith Rajamohan

arXiv:2602.19643·cs.CL·February 24, 2026

KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge

Alex Robertson, Huizhi Liang, Mahbub Gani, Rohit Kumar, Srijith Rajamohan

PDF

Open Access 1 Video

TL;DR

KGHaluBench is a knowledge graph-based benchmark designed to evaluate LLMs' hallucination tendencies by dynamically generating challenging questions and verifying responses at multiple levels, offering a comprehensive and interpretable assessment of truthfulness.

Contribution

This work introduces a novel knowledge graph-based framework for dynamically constructing multifaceted questions to evaluate LLM hallucinations, addressing limitations of static benchmarks.

Findings

01

Evaluated 25 frontier models with new accuracy and hallucination metrics.

02

Identified knowledge factors influencing hallucinations across model sizes.

03

Provided a publicly available benchmark for future hallucination mitigation research.

Abstract

Large Language Models (LLMs) possess a remarkable capacity to generate persuasive and intelligible language. However, coherence does not equate to truthfulness, as the responses often contain subtle hallucinations. Existing benchmarks are limited by static and narrow questions, leading to limited coverage and misleading evaluations. We present KGHaluBench, a Knowledge Graph-based hallucination benchmark that assesses LLMs across the breadth and depth of their knowledge, providing a fairer and more comprehensive insight into LLM truthfulness. Our framework utilises the KG to dynamically construct challenging, multifaceted questions, whose difficulty is then statistically estimated to address popularity bias. Our automated verification pipeline detects abstentions and verifies the LLM's response at both conceptual and correctness levels to identify different types of hallucinations. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge· underline

Taxonomy

TopicsMisinformation and Its Impacts · Adversarial Robustness in Machine Learning · Mental Health via Writing