GPTKB v1.5: A Massive Knowledge Base for Exploring Factual LLM Knowledge

Yujia Hu; Tuan-Phong Nguyen; Shrestha Ghosh; Moritz M\"uller; Simon Razniewski

arXiv:2507.05740·cs.CL·July 9, 2025

GPTKB v1.5: A Massive Knowledge Base for Exploring Factual LLM Knowledge

Yujia Hu, Tuan-Phong Nguyen, Shrestha Ghosh, Moritz M\"uller, Simon Razniewski

PDF

Open Access 1 Datasets

TL;DR

This paper introduces GPTKB v1.5, a large-scale, densely interlinked knowledge base built from GPT-4.1, enabling improved exploration, querying, and analysis of factual knowledge in language models.

Contribution

It presents a novel methodology for massive-recursive LLM knowledge materialization and demonstrates its application through a comprehensive knowledge base and interactive exploration tools.

Findings

01

Knowledge base contains 100 million triples

02

Enables link-traversal and SPARQL-based querying

03

Facilitates systematic analysis of LLM factual knowledge

Abstract

Language models are powerful tools, yet their factual knowledge is still poorly understood, and inaccessible to ad-hoc browsing and scalable statistical analysis. This demonstration introduces GPTKB v1.5, a densely interlinked 100-million-triple knowledge base (KB) built for $14,000 from GPT-4.1, using the GPTKB methodology for massive-recursive LLM knowledge materialization (Hu et al., ACL 2025). The demonstration experience focuses on three use cases: (1) link-traversal-based LLM knowledge exploration, (2) SPARQL-based structured LLM knowledge querying, (3) comparative exploration of the strengths and weaknesses of LLM knowledge. Massive-recursive LLM knowledge materialization is a groundbreaking opportunity both for the research area of systematic analysis of LLM knowledge, as well as for automated KB construction. The GPTKB demonstrator is accessible at https://gptkb.org.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Knowledge-aware-AI/GPTKB_v1.5
dataset· 62 dl
62 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification