KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models
Yuyang Bai, Shangbin Feng, Vidhisha Balachandran, Zhaoxuan Tan, Shiqi, Lou, Tianxing He, Yulia Tsvetkov

TL;DR
KGQuiz is a comprehensive benchmark designed to evaluate the knowledge generalization abilities of large language models across multiple domains and task complexities, revealing strengths in simple tasks and challenges in complex reasoning.
Contribution
The paper introduces KGQuiz, a scalable, triplet-based benchmark covering diverse knowledge domains and task formats to systematically assess LLMs' knowledge generalization capabilities.
Findings
LLMs perform well on simple knowledge QA tasks
Complex reasoning and domain-specific tasks remain challenging for LLMs
KGQuiz enables nuanced analysis of LLMs' knowledge abilities across domains
Abstract
Large language models (LLMs) demonstrate remarkable performance on knowledge-intensive tasks, suggesting that real-world knowledge is encoded in their model parameters. However, besides explorations on a few probing tasks in limited knowledge domains, it is not well understood how to evaluate LLMs' knowledge systematically and how well their knowledge abilities generalize, across a spectrum of knowledge domains and progressively complex task formats. To this end, we propose KGQuiz, a knowledge-intensive benchmark to comprehensively investigate the knowledge generalization abilities of LLMs. KGQuiz is a scalable framework constructed from triplet-based knowledge, which covers three knowledge domains and consists of five tasks with increasing complexity: true-or-false, multiple-choice QA, blank filling, factual editing, and open-ended knowledge generation. To gain a better understanding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
