Knowledge-based Consistency Testing of Large Language Models
Sai Sathiesh Rajan, Ezekiel Soremekun, Sudipta Chattopadhyay

TL;DR
This paper introduces KonTest, an automated framework that uses knowledge graphs to test and improve the consistency and knowledge coverage of large language models, revealing significant gaps and errors.
Contribution
We propose KonTest, a novel knowledge-based testing framework that systematically exposes inconsistencies and knowledge gaps in large language models and offers a mitigation strategy.
Findings
KonTest identifies 19.2% error-inducing inputs in tested LLMs.
It reveals a 16.5% knowledge gap across models.
Mitigation reduces LLM knowledge gap by 32.48%.
Abstract
In this work, we systematically expose and measure the inconsistency and knowledge gaps of Large Language Models (LLMs). Specifically, we propose an automated testing framework (called KonTest) which leverages a knowledge graph to construct test cases. KonTest probes and measures the inconsistencies in the LLM's knowledge of the world via a combination of semantically-equivalent queries and test oracles (metamorphic or ontological oracle). KonTest further mitigates knowledge gaps via a weighted LLM model ensemble. Using four state-of-the-art LLMs (Falcon, Gemini, GPT3.5, and Llama2), we show that KonTest generates 19.2% error inducing inputs (1917 errors from 9979 test inputs). It also reveals a 16.5% knowledge gap across all tested LLMs. A mitigation method informed by KonTest's test suite reduces LLM knowledge gap by 32.48%. Our ablation study further shows that GPT3.5 is not suitable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies
