Knowledge-based Consistency Testing of Large Language Models

Sai Sathiesh Rajan; Ezekiel Soremekun; Sudipta Chattopadhyay

arXiv:2407.12830·cs.CL·August 15, 2025

Knowledge-based Consistency Testing of Large Language Models

Sai Sathiesh Rajan, Ezekiel Soremekun, Sudipta Chattopadhyay

PDF

Open Access 1 Repo

TL;DR

This paper introduces KonTest, an automated framework that uses knowledge graphs to test and improve the consistency and knowledge coverage of large language models, revealing significant gaps and errors.

Contribution

We propose KonTest, a novel knowledge-based testing framework that systematically exposes inconsistencies and knowledge gaps in large language models and offers a mitigation strategy.

Findings

01

KonTest identifies 19.2% error-inducing inputs in tested LLMs.

02

It reveals a 16.5% knowledge gap across models.

03

Mitigation reduces LLM knowledge gap by 32.48%.

Abstract

In this work, we systematically expose and measure the inconsistency and knowledge gaps of Large Language Models (LLMs). Specifically, we propose an automated testing framework (called KonTest) which leverages a knowledge graph to construct test cases. KonTest probes and measures the inconsistencies in the LLM's knowledge of the world via a combination of semantically-equivalent queries and test oracles (metamorphic or ontological oracle). KonTest further mitigates knowledge gaps via a weighted LLM model ensemble. Using four state-of-the-art LLMs (Falcon, Gemini, GPT3.5, and Llama2), we show that KonTest generates 19.2% error inducing inputs (1917 errors from 9979 test inputs). It also reveals a 16.5% knowledge gap across all tested LLMs. A mitigation method informed by KonTest's test suite reduces LLM knowledge gap by 32.48%. Our ablation study further shows that GPT3.5 is not suitable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sparkssss/KonTest
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Semantic Web and Ontologies