Enabling LLM Knowledge Analysis via Extensive Materialization
Yujia Hu, Tuan-Phong Nguyen, Shrestha Ghosh, Simon Razniewski

TL;DR
This paper introduces a comprehensive methodology for analyzing LLMs' factual knowledge by extensive materialization through recursive querying, resulting in a large knowledge base that enables detailed insights into the scope, accuracy, and biases of LLMs.
Contribution
It presents a novel recursive querying approach to fully materialize LLM knowledge, overcoming previous limitations of small sample analysis, and provides the GPTKB knowledge base as a resource.
Findings
GPTKB contains 101 million triples for 2.9 million entities.
Analysis reveals insights into GPT-4o-mini's scale, accuracy, and biases.
The methodology enables comprehensive understanding of LLM knowledge structures.
Abstract
Large language models (LLMs) have majorly advanced NLP and AI, and next to their ability to perform a wide range of procedural tasks, a major success factor is their internalized factual knowledge. Since Petroni et al. (2019), analyzing this knowledge has gained attention. However, most approaches investigate one question at a time via modest-sized pre-defined samples, introducing an ``availability bias'' (Tversky&Kahnemann, 1973) that prevents the analysis of knowledge (or beliefs) of LLMs beyond the experimenter's predisposition. To address this challenge, we propose a novel methodology to comprehensively materialize an LLM's factual knowledge through recursive querying and result consolidation. Our approach is a milestone for LLM research, for the first time providing constructive insights into the scope and structure of LLM knowledge (or beliefs). As a prototype, we build GPTKB,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsBalanced Selection
