KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge
Alex Robertson, Huizhi Liang, Mahbub Gani, Rohit Kumar, Srijith Rajamohan

TL;DR
KGHaluBench is a knowledge graph-based benchmark designed to evaluate LLMs' hallucination tendencies by dynamically generating challenging questions and verifying responses at multiple levels, offering a comprehensive and interpretable assessment of truthfulness.
Contribution
This work introduces a novel knowledge graph-based framework for dynamically constructing multifaceted questions to evaluate LLM hallucinations, addressing limitations of static benchmarks.
Findings
Evaluated 25 frontier models with new accuracy and hallucination metrics.
Identified knowledge factors influencing hallucinations across model sizes.
Provided a publicly available benchmark for future hallucination mitigation research.
Abstract
Large Language Models (LLMs) possess a remarkable capacity to generate persuasive and intelligible language. However, coherence does not equate to truthfulness, as the responses often contain subtle hallucinations. Existing benchmarks are limited by static and narrow questions, leading to limited coverage and misleading evaluations. We present KGHaluBench, a Knowledge Graph-based hallucination benchmark that assesses LLMs across the breadth and depth of their knowledge, providing a fairer and more comprehensive insight into LLM truthfulness. Our framework utilises the KG to dynamically construct challenging, multifaceted questions, whose difficulty is then statistically estimated to address popularity bias. Our automated verification pipeline detects abstentions and verifies the LLM's response at both conceptual and correctness levels to identify different types of hallucinations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMisinformation and Its Impacts · Adversarial Robustness in Machine Learning · Mental Health via Writing
